Conversation
Added two new documents to the repository: CHANGELOG.md and CONTRIBUTORS.md. The CHANGELOG.md file contains a summary of the changes made in each version of the project, while the CONTRIBUTORS.md file lists all contributors to the project along with their contributions.
Added a CSpell configuration for spell-checking the contents of the repository, checked all files, and corrected any spelling mistakes. The spell-checking was integrated into the GitHub Actions workflow, so that the spelling is checked on every push and pull request to the master branch. The documentation still needs to be updated to reflect the new spell-checking process. Furthermore, removed LaTeX commands for accented characters from the bibliography file, as they we are using Pybtex to handle the bibliography and it has full Unicode support.
The "master" branch was renamed to "main". All references to the "master" branch in the repository have been updated to point to "main".
Moved the logo and its source file from the docs to a separate top-level design directory. The source file was cleaned up: - Converted the title of the logo to a path, because the font is from Google Fonts and not available in the SVG. It would be possible to embed the font, but this would increase the size of the SVG significantly and would require us to include the license. - The logo was previously only available with the title. For this reason a second page was added to the SVG, which contains the logo without the title. - Named and cleaned up all objects and groups in the SVG. A PNG version of the logo without the title was also added and the favicon used in the documentation was updated to use the SVG version of the logo without the title. Previously, it was still the old "S" logo, which was from before CoRelAy was renamed from Sprincl. All references to the old logo were updated to point to the new location. The URL used in the read me was made absolute, because the read me is also used for the PyPI package and PyPI would not be able to resolve the relative URL to the logo on GitHub. Finally, the logo was added to the documentation, which previously contained a copy of the logo in the docs/images directory, but did not include it. The version contained in the docs/images directory was removed and the index page now directly references the logo in the design directory.
The project is dual-licensed under the GNU General Public License Version 3 (GPL-3.0) or later, and the GNU Lesser General Public License Version 3 (LGPL-3.0) or later. The GPL-3.0 license is in the COPYING file and the LGPL-3.0 license is in the COPYING.LESSER file. Additionally, there is LICENSE file, which contains a note about the dual-licensing. This is, however, confusing, as GitHub does not recognize that the file is only a note about the dual-licensing and not the actual license. This commit removes the LICENSE file and adds a note about the dual-licensing to the read me. Also, the read me was cleaned up a bit to fix some of the problems that Markdownlint found.
Added a CITATION.cff file, which contains the necessary information to cite this repository. This file is based on the Citation File Format (CFF) standard. This file is supported by GitHub and results in a "Cite this repository" button on the website, which allows users to directly generate a proper citation for the repository in multiple different formats. Also added a bullet point to the changelog about the removal of the LICENSE file, which I had forgotten to do in the last commit.
Converted the CoRelAy project from a setup.py project to a uv project. For this, a pyproject.toml file was created, which is configured to do the same as the setup.py file. The source code was moved from "src/corelay" to "source/corelay". The tox configuration was updated and now uses tox-uv to run all commands via uv instead of directly creating environments. This means, that all Python environments can now be run without having to install multiple Python versions. The configuration was also cleaned up. Support for Python 3.7 was removed, not only because it has already reached its end-of-life, but also because some of the dependencies (especially tox-uv) do not support it anymore. The two remaining supported Python versions (3.8 and 3.9) are now recorded in the ".python-versions" file, which makes it trivial to install them using uv. The GitHub Actions workflow that runs the unit tests, linters, and builds the documentation was updated to point to the new locations of the source code, unit tests, and configuration files. Also, it now uses uv to run the tests and linters. Split up the GitHub Actions workflow job that ran PyLint and Flake8 into two separate jobs, which allows for better parallelization and faster execution of the workflow. the "actions/checkout" action used in the GitHub Actions workflow was updated from version 2 to version 4. The GitHub Actions workflow matrix configurations for Python 3.7 was removed, as it is no longer supported by the project. Finally, the workflow was cleaned up and documented. The usage of uv still needs to be documented, but this will be done later when the entire documentation will be updated. Furthermore, the following changes were made: - The unit tests were moved from the "tests" folder to "tests/unit_tests", which clears up space for other test files. - The tox configuration was moved from the root directory to the "tests" directory, which is more appropriate, as tox is mostly used for testing. - The configurations for the linters PyLint and Flake8 were moved from the root directory of the repository (in the case of PyLint) and from the tox configuration file (in the case of Flake8) into the "tests/linters" directory. The CSpell configuration was also moved there.
Updated the Python dependencies in the "pyproject.toml" to their
respective latest versions. Some of the dependencies no longer support
Python 3.8 and Python 3.9 (as well as Python 3.10). For this reason, the
project was migrated to support Python 3.11, 3.12, and 3.13. The
".python-versions" file, the tox configuration, and the GitHub Actions
workflow were updated to reflect this change. One of the dependencies
("metrohash-python") requires a C++ compiler to build and uses the "c++"
command, which may not be available on all systems. To ensure that users
are not confused by this, a note was added to the README file
explaining that the "c++" command is required to build the project and
showing how to install it on Fedora (one of the systems that do not have
the "c++" command installed by default).)
The configuration for Read the Docs was updated to use the latest
available versions of Ubuntu (24.04) and Python (3.12). It was also
documented.
Furthermore, the following changes were made:
- The `corelay/version.py` file was deleted as it is automatically
generated during the build process and should not be checked into
source control.
- The configuration for the GitLab CI, which was stored in the
".gitlab-ci.yml" file, was removed. The project is no longer being
hosted on GitLab, and the CI configuration is no longer needed.
Some of the unit tests were failing due to changes in the dependencies, which were updated to fix the issues: - Since NumPy 1.26, NumPy array functions are no longer actual functions, instead they are now wrapped in a class, which calls into the C implementation of the corresponding functions. Unfortunately, this class does not implement "FunctionType". Many of the classes in CoRelAy allow a "dtype", which is checked for consistency with the input data. For example, the "Param" class allows to specify a "dtype" and a default value. The default value is checked for consistency with the "dtype" and an exception is raised if the default value is not of the correct type. Since NumPy array functions are no longer actual functions, code like "pooling_function = Param(FunctionType, np.sum)" cannot be used anymore. In fact, this code would also not work for builtin functions like "sum", for class methods, and some other cases. For this reason, the consistency check will now detect if the user specified "FunctionType" as the "dtype" and will automatically add the following types to the list of accepted types: "BuiltinFunctionType", "BuiltinMethodType", "MethodType", "numpy.ufunc", and "type(numpy.max)" (the type of NumPy array functions is private). This will make old code work as expected and open up new possibilities for the user, like using builtin functions and class methods. - Functions in SciKit Learn and SciKit Image that allow single-channel or multi-channel images used to have a boolean "multichannel" constructor parameter. This parameter was deprecated in favor of specifying the axis of the channels using a new "channel_axis" parameter. A value of "None" indicates that the image is single-channel. The usage of "multichannel" was removed and replaced with the appropriate "channel_axis" parameter. - The SciKit Learn implementation of the t-SNE dimensionality reduction algorithm, represented by the "TSNE" class, now uses PCA as the default initialization method instead of a random initialization. The "precomputed" metric is not compatible with PCA initialization, which was used in CoRelAy, which caused an exception to be raised. For this reason, the initialization method is now explicitly set to "random" in the "TSNE" class. - The "AgglomerativeClustering" class in SciKit Learn now uses the "metric" constructor parameter instead of "affinity" to specify the distance metric. Uses of the "affinity" parameter were replaced with the "metric" parameter.
Updated the linting of the project. First of all, the Flake8 linter was removed. It is a wrapper for the PyFlakes and PyCodeStyle linters, and Ned Batchelder's McCabe script, which is used to compute the McCabe complexity. PyFlakes is a static code analyzer similar to PyLint, but way less useful. McCabe complexity is a useful metric, but it is not currently used by the project. For this reason, the Flake8 linter was removed and PyCodeStyle is now directly used instead. Also, MyPy, a static type checker for Python, PyDocLint, a docstring linter, and MarkdownLint, a linter for Markdown documents, were added to the project. The configuration for PyLint was updated by removing all options that only have their default values, as they are automatically set by PyLint. The remaining options were updated to make them less restrictive for the project. Mainly, maximum and minimum values were increased/decreased to allow for more flexibility. Flake8 was removed from the tox configuration and the GitHub Actions workflow, and the new linters were added. All errors and warnings from the linters were fixed. In particular, the following incomplete list of changes were made: - The imports were sorted in all Python files. They are now categorized by standard library imports, third-party library imports, and local imports, each separated by a blank line. Each category is now sub-categorized into regular imports and "from-imports". These are not separated by blank lines. Each sub-category is sorted alphabetically. - The docstrings were updated to now follow the Google style guide, which is easier to read and write than the NumPyDoc style. Also, this style is supported by PyDocLint, which is now used to lint the docstrings. - Missing module docstrings, function docstrings, class docstrings, and method docstrings were added to all Python files. - The maximum line length was set to 150 characters, which is a bit less restrictive than the previous 120 characters. Still, this should fit two files side by side on modern high-resolution monitors. - "Click" was removed as the command line argument parser. It is a great library, but it has the disadvantage of producing a lot of PyLint errors. Instead, the built-in "argparse" library is now used, which is a bit more verbose, but it does not produce any PyLint errors. - Variables were renamed to be more descriptive. Previously, most variable names were heavily abbreviated and some of them were not very intuitive. - Most of the inline PyLint disables were removed and either the offending rule was directly disabled or the code was changed to not trigger the rule anymore. This was done to make the code cleaner and easier to read. For all remaining inline PyLint disables, the reason for the disable was better explained. - Relative imports were replaced by absolute imports, because they make it easier to understand where the module is located. Especially because the project has multiple modules that are named the same in different sub-packages. - Type hints were added to all functions, methods, class attributes, and variables where necessary. Also, parameters and attributes of type string that had a specific set of possible values are now typed using the "Literal" type. This makes possible for MyPy to check if specified values are valid. - In the unit tests, the fixture parameters were masking the fixture function names. This was fixed by renaming the fixture function names "get_<fixture_name>_fixture" and specifying an explicit fixture name. The fixtures are now also explicitly scoped to the module level, although this did not cause any problems previously. In general, the code was cleaned up to make it easier to read, understand, and maintain. Some instances of dead code were eliminated. The goal was to make the code more Pythonic and to follow the PEP 8 style guide as closely as possible, while still maintaining backwards compatibility. This was, however, not possible in all cases and some backwards compatibility was sacrificed for the sake of improved static typing. This includes the removal of some meta classes, which were replaced by protocols (which are implicit interfaces). The most important change was how slots are defined. Previously, the slots were defined by assigning an instance of the "Slot" class (e.g., using the "Param" class, which derives from the "Slot" class) to a class attribute. The problem with this is, that Python allows users to access class attributes via the class name or via an instance of the class, e.g., using "self". This is unless an instance attribute with the same name is defined is defined, in which case the class attribute is accessed when using the class name and the instance attribute is accessed when using the class instance. The previous version of CoRelAy exploited this by returning the "Slot" instance when accessing the attribute via the class name and returning the value of the "Slot" when accessing the attribute via the class instance. Unfortunately, MyPy cannot handle this and therefore always expects the "Slot" instance to be returned when accessing the attribute in any way and thus raises an error because there is a type mismatch. This was fixed by introducing a new syntax, where slots are now defined via type hints. Slots are now defined by declaring a class attribute with a type hint using the "Annotated" type. This is a type that allows to specify a type and additional metadata. For example, a string "Param" can be defined as follows: "param: Annotated[str, Param(str, 'Default value')]". This way, MyPy can infer the runtime type of the attribute and the metadata is used to define the "Slot". The old syntax is still supported, but it is no longer recommended and may be removed in the future. The Markdown files in the project were also updated to follow the MarkdownLint style guide. A major problem in the tox configuration was fixed. Previously, tox-uv was always using the system Python version instead of the Python versions installed via uv. This meant that the Python versions specified in the environment names and the "base_python" option were ignored. This meant that the unit tests were not run with the correct Python version and the environments that used the "base_python" option were not created with the correct Python version. This was fixed by setting the "uv_python_preference" to "only-managed". Since tox-uv could not find the "pyproject.toml" file, the tox configuration file had to be moved to the "source" directory, which is the directory where the "pyproject.toml" file is located. Furthermore, as a general clean up, the PyTest and PyCoverage configurations were moved out of the tox configuration file and into their own configuration files, which were added to the "tests/unit_tests" directory. The configuration for Sphinx was not only updated to fix the linter warnings and generally to make it more readable, but also to fix errors that arose due to the fixes that were made to the tox configuration. The "docs" environment specified a base Python version of 3.13.3, but due to the mistakes that I had made in the tox configuration, tox-uv was not using the correct Python version, but the system Python version instead. Now, that the tox configuration was fixed, the correct Python version is used, which is 3.13.3. This caused some errors in the Sphinx configuration, because the "pkg_resources" module was deprecated and removed in Python 3.12. This module was used to get the source code file paths and line numbers of functions, classes, methods, etc. for the "linkcode" extension. This was replaced by the "inspect" module. Also, the configuration was updated to use the correct capitalization of the project name and to always use the current year in the copyright instead of hard-coding it. The GitHub Actions workflow was updated to now run when pull requests or merges are made to the "develop" branch. Previously, this was commented out, because the unit tests, the linters, and the building of the documentation were not working and would therefore fail the workflow. Now that all of them are working, the workflow was updated to run when pull requests or merges are made to the "develop" branch. In the "pyproject.toml" file, the "dev-dependencies" option of the "tool.uv" section was replaced by "dependency-groups", which have recently been standardized and are now supported by the uv project manager. Instead of only having a single "dev" group, a group for building the documentation, a group for running the linters, and a group for running the unit tests were created. The "dev" group includes all of the other groups, so that all dependencies are installed when the "dev" group is installed. Also, the extra requirements were accidentally not carried over from the old "setup.py" file to the new "pyproject.toml" file. These were added again to the "project.optional-dependencies" section. The "docs" and "tests" groups were not added to the optional dependencies, because they are now included in the dependency groups. A note was added to the read me file, explaining how to install the development dependencies including all optional dependencies. Finally, the license identifier that was used in the citation file was wrong and was changed from "AGPL-3.0-or-later" to "GPL-3.0-or-later AND LGPL-3.0-or-later".
Updated the documentation to reflect the changes made to CoRelAy. First of all, the Sphinx configuration was updated: - The "sphinxcontrib.datatemplates" extension was removed, because it was not used in the documentation. - Added the "sphinx.ext.intersphinx" extension to add links in the documentation to external documentations, e.g., the Python standard library documentation and the NumPy documentation. The documentation was largely extended and a bit reorganized. The API reference documentation was moved from the "reference" directory to the "api-reference" directory. This was done to better reflect the purpose of the documentation, since there are also bibliographical references in the documentation. The "Getting Started" section was expanded to new section with multiple sub-sections: "Installation", "Basic Usage", and an "Example Project". The installation and basic usage sections were moved from the original "Getting Started" section. The "Example Project" section contains a more elaborate example of how to use CoRelAy to analyze a dataset generated by Zennit using the SpRAy workflow that can be visualized using ViRelAy. A new "Contributor's Guide" section was added with sub-sections on how to report issues and feature requests, and how to contribute code or documentation. Finally, a new "Migration Guide" section was added to help users migrate from CoRelAy v0.2 to CoRelAy v0.3. This was done, because the changes made in CoRelAy v0.3 are quite extensive and it is not easy to find out what has changed and how to adapt existing code to the new version. Finally, more citations were added to the documentation to relevant literature. A custom CSS file was added to the documentation to change the alignment of the documentation text to be justified and to automatically break long words. This was done to improve the readability of the documentation and to align it with usual scientific documents. Also, the width of the labels in the bibliography was set to a fixed size, so that the labels all have the same width. This was done, because they looked inconsistent and messy before. The docstrings in the CoRelAy source code were updated to now use the ":py:*:" directives to link references to classes, functions, methods, modules, and values to their respective external documentation. The usage of type hints had to be changed in some places to accommodate the way that Sphinx AutoDoc and Intersphinx reference types. Also, some of the docstrings were updated to fix typos, improve clarity, or to add missing information. By default, Sphinx AutoDoc only documents the "__init__" method and leaves out all other special "dunder" methods, such as "__repr__". A list of the special methods that are used in the CoRelAy source code was added to the list of methods covered by Sphinx AutoDoc, so that these methods are also documented. Usage of the "Literal" type hint was totally removed from the source code, because it is not properly supported by Sphinx AutoDoc and made more problems than it solved. The code for retrieving the source code location of type aliases to "Literal" type hints was removed from the documentation configuration Python file, as it is no longer needed. The invocation of Sphinx in tox to build the documentation was updated to include the "--fresh-env" option, so that the documentation is always built in a fresh environment. This solves some issues with the documentation build, where the documentation was not updated correctly after changes were made to the source code or the documentation itself. Besides the updates to the documentation, the read me of the repository was also updated. The introduction was extended to better explain the CoRelAy library. A section on the features of CoRelAy was added. The installation instructions and the usage section were now wrapped in a new "Getting Started" section. The example code in the usage section was updated to reflect the changes made to the example scripts. The "Contributing" section was also updated to include a proper description and links to the contributors guide in the documentation. Also, the example scripts were updated to reflect the coding style and docstring conventions used in the CoRelAy library. They were moved from the "example" directory to the "docs/examples" directory. This was done because the examples are part part of the documentation and to keep the root directory clean and free of unnecessary clutter. Some of the "__repr__" methods in the CoRelAy source code had to be updated, because Sphinx AutoDoc uses the "__repr__" method to generate type information for parameters, which was interfering with the usage of "Annotated" type hints: since the instance of "Param" or "Task" is a parameter of the "Annotated" type hint, AutoDoc would call the "__repr__" method on them and interpret the output as the type. This is obviously weird and not what we want, so the "__repr__" methods were updated to return a string that is more suitable for the documentation. Also, the code for retrieving the source code of a lambda expression, which was used in the "__repr__" methods was massively updated to fix some edge cases that were not handled correctly before.
Removed unused words from the CSpell dictionary to keep it clean and relevant.
Added a new GitHub Actions workflow, which builds the project and publishes it to PyPI. This workflow is triggered when GitHub release for a new version is created.
Now, when the old syntax for declaring slots is used, a deprecation warning will be raised. Unfortunately, Python filters out deprecation warnings by default, so developers will need to add "-W all" to command line arguments when running Python to see the warning. Hopefully, developers will use a development environment that automatically enables all warnings for them.
Added new unit tests and updated existing ones to increase the test coverage of the CoRelAy codebase to 100%. The dependencies of the CoRelAy package and of the Node.js-based linters were updated to their latest versions, since this is the last commit before the release. This was done to ensure that everything is up-to-date and works as expected. The UMAP dependency is now installed directly from the GitHub repository, since the latest version of UMAP is not yet available on PyPI, although already being available since February 2025. The new version fixes some deprecation warnings of third-party libraries, which are used by CoRelAy, thus ensuring that CoRelAy does not raise any deprecation warnings itself. Pybtex, a dependency of the BibTeX extension for Sphinx, which is used to generate the bibliography in the documentation also causes a deprecation warning when building the documentation as it is still using the deprecated "pkg_resources" module for loading plugins. The version of Pybtex available on PyPI is 0.24.0, which is still using the "pkg_resources" module. The problem has already been fixed in the development version, but it is not yet released. Therefore, the development version is installed directly from the Git repository. If and when a successor to 0.24.0 is released, this dependency can be removed again. The optional dependencies of the CoRelAy package are now also included in the "testing" and "linting" dependency groups, so that CoRelAy can be tested and linted with the optional dependencies installed. Some minor changes were made to the codebase: - The "Shaper" flow processor was extended to support dictionaries and string indices. It seems like, these features were already expected to be present, but they were prevented by some minor mistakes: Dictionaries were not supported, because the "Shaper" processor tested for the type of the input data by testing if they implemented the "Sequence" protocol, which is not the case for dictionaries. Instead, dictionaries implement the "Mapping" protocol, which is now also tested for. String indices were not supported, because the indices were tested for implementing the "Sequence" protocol, which is also the case for strings. This meant, that each character of a string was treated as a separate index, which is not the intended behavior. Now, it is also checked if the indices are not strings. - The stack level of the warning, which is raised when the old "Slot" syntax is used was increased to 2, so that the user can see the location of the code that is using the old syntax, instead of seeing the location of the warning itself. - The "get_lambda_expression_source_code" function was updated to fixed some use-cases in which the source code of the lambda expression was not correctly retrieved. - The function types that "Slot" and "Plug" support has been expanded to include some types that were previously not supported. - The "get_fully_qualified_name" function was also updated to support these new function types. Some further code changes were made to make the code more testable. This only includes changes that remove checking code that is never executed. For example, in the "MetaTracker" meta class, the check if the class has an attribute "__bases__" was removed, since this attribute is always present in Python classes. Also, some minor adjustments were made to the docstrings of the codebase to include more information or correct the information already present. Finally, some bugs in the codebase were fixed, that were discovered during the creation of the new unit tests: - Fixed a bug in the "RadialBasisFunction" affinity processor: The formula for the Radial Basis Function (RBF) kernel was incorrect. The distance matrix was not squared resulting in the wrong formula K(d) = exp(-d/(2sigma^ 2)) instead of the correct formula K(d) = exp(-(d^2)/(2sigma^2))$. This was fixed by squaring the distance matrix. - Fixed a bug in the "Histogram" processor: The processor was using the "numpy.histogramdd" function, which computes a multivariate histogram, but the processor was meant to compute a histogram over the channels of the input data, for which the "numpy.histogramdd" function is not suitable. Instead, the "numpy.histogram" function is now used in conjunction with the "numpy.stack" function to compute the histogram over the channels of the input data. Also, the "Histogram" processor was not able to deal with channel-last data, which is now supported. The "Histogram" processor created in the "virelay_analysis.py" example script was also not working as expected: Although it was using the "numpy.histogram" function, it did not account for its return type, which is a tuple containing the histogram and the bin edges. Since we now have a working implementation of the "Histogram" processor, the example script was updated to use it. - The root directory specified in the CSpell script was not correct, which meant that not all files were checked for spelling errors. This was fixed by updating the root directory to the correct one.
The GitHub Actions Workflow for the deployment of CoRelAy to PyPI was copied over from ViRelAy and still contained the installation of Node.js, which is needed in ViRelAy for the building of the frontend, but is not required for the building of CoRelAy. For this reason, the installation of Node.js was removed from the GitHub Actions Workflow for the deployment.
- this is so that one can easily install the package with a github link
- rename source -> src as per python naming convention
- make python version less restrictive (3.11.12 was released April 2025)
- fix github actions deploy.yaml, which was copied over from virelay
without corelay-specific changes
- update directory in github actions tests.yaml to reflect the top-level
pyproject.toml (by removing the --directory options)
- replace "PyLint Linter" for other linting stages with more descriptive
names
- fix typing issues:
- `corelay.processor.affinity.SparseKNN`: when `symmetric=True`, the
addition of the affinity matrix and its transpose results in the
matrix being converted to a `bsr_sparse` matrix (from
`csr_sparse), which mypy did not like, so the affinity is now
"Any", which is also the result type of the function
- `corelay.processor.embedding.EigenDecomposition`: the type `which`
of `eigsh` was incorrect
- relabel upcoming release from 0.3.0 to 1.0.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.