API tutorial

How to use the API of findmycells

Before we start:

Please have a look at this comprehensive overview of all prerequisites of a findmycells project:

  • a directory that will contain your entire findmycells project (from now on, referred to as project root directory)
  • the image data you’d like to analyze in a file format that is compatible with findmycells (currently supported formats include: .czi, .png, .tif)
  • a deepflash2 model ensemble trained to segment your image features of interest (models must be compatible with deepflash2 v0.1.7! check out the documentation deepflash2, especially the GUI tutorial, for how to train model ensembles)
  • optional: a complementary ROI file for each image that specifies the area(s) of interest in your image that shall be quantified (check out the guide in the GUI tutorial)

The following tutorial assumes you meet all of the requirements listed above.

Also, please note: in general, we recommend to use the GUI, as it directly displays all configuration and processing options that are available for each step, and therefore doesn’t require you to browse through all source files to figure them out yourself. This tutorial will also not cover all available options, and is rather intended to give you a rough idea of how to interact with the API.

Getting started - create a new findmycells project:

This tutorial is made to run in the structure provided by the findmycells GitHub repository and to render the API tutorial for the documentation webpage. Hence, some markdown cells may not be rendered correclty if you’re running them on your local machine (they will usually start with something like: “:::{.callout”. Also, if you’re local filepaths differ, please adjust the project_root_dir accordingly.

import os
from pathlib import Path

# if you downloaded the GitHub repository and left everything in it's place, this will also work for you:
dir_containing_this_notebook = Path(os.getcwd())
findmycells_repo_root_dir = dir_containing_this_notebook.parent.parent
project_root_dir = findmycells_repo_root_dir.joinpath('test_data', 'cfos_fmc_test_project')
print(f'Root directory path: {project_root_dir}')
Root directory path: /home/ds/GitHub_repos/findmycells/test_data/cfos_fmc_test_project

Now let’s use this root directory to initialize our findmycells project:

from findmycells.interfaces import API

fmc_cfos_project = API(project_root_dir = project_root_dir)
Note

If you are using the “cfos_fmc_test_project”, it already contains the subdirectories that findmycells uses to sort it’s data (e.g. “preprocessed_images” or “microscopy_images”). If it would have been empty, these top-level subdirectories will be created automatically.

Note

In addition, the “cfos_fmc_test_project” also already comes with microscopy images & roi files arranged in the correct file structure. Please check out this section on how you have to arrange your image data and on how to create ROI files.

Since our image data & roi files are already organized in the expected subdirectory structures, we can import all files to our findmycells project. This will create the corresponding entries in the database, but it will not yet actually read and load the data. These information are derived from all available files in the “microscopy_images” subdirectory. This also allows you to re-run the following method again later if you want to update the files associated with your project. Simply add new files, or delete any existing files - findmycells will automatically identify these files and add or remove them from your project, respectively.

fmc_cfos_project.update_database_with_current_source_files()
Warning

When you remove a file from your microscopy_images subdirectory tree and run .update_database_with_current_source_files(), findmycells will automatically remove all files associated with the corresponding file ID (e.g. all preprocessed images, segmentation masks, and also quantification results)

By accessing the database associated with your API object, you can also check what files are currently associated with your project:

import pandas as pd

# to improve readability, let's convert the file_infos dictionary into a pandas DataFrame:
pd.DataFrame(data = fmc_cfos_project.database.file_infos)
file_id original_filename main_group_id subgroup_id subject_id microscopy_filepath microscopy_filetype rois_present rois_filepath rois_filetype
0 0000 dentate_gyrus_01 experimental_group week_01 subject_02 /home/ds/GitHub_repos/findmycells/test_data/cf... .png True /home/ds/GitHub_repos/findmycells/test_data/cf... .zip
1 0001 dentate_gyrus_02 experimental_group week_01 subject_02 /home/ds/GitHub_repos/findmycells/test_data/cf... .png True /home/ds/GitHub_repos/findmycells/test_data/cf... .zip
2 0002 dentate_gyrus_01 control_group week_01 subject_01 /home/ds/GitHub_repos/findmycells/test_data/cf... .png True /home/ds/GitHub_repos/findmycells/test_data/cf... .zip
3 0003 dentate_gyrus_02 control_group week_01 subject_01 /home/ds/GitHub_repos/findmycells/test_data/cf... .png True /home/ds/GitHub_repos/findmycells/test_data/cf... .zip

Congrats! You successfully started a findmycells project and associated both images & ROI files with it!

Define how image and roi files shall be read:

You have several configuration options, to customize how your data shall be loaded. Please head over to the source code of MicroscopyReaderSpecs or ROIReaderSpecs to see exactly what options are available. If any (or all) configuration options are missing, the corresponding default values will be used automatically. In this example, we will define some configs for the microscopy image reader (they reflect the default values, though, and are only included for illustrational purposes), and none for the roi readers (which means that all default values will be used here):

microscopy_reader_configs = {'all_color_channels': True,
                             'all_planes': True}
fmc_cfos_project.set_microscopy_reader_configs(microscopy_reader_configs = microscopy_reader_configs)
fmc_cfos_project.set_roi_reader_configs()
Note

When “configs” can be passed along as an attribute in any function / method in findmycells, they will always be a dictionary. (The only expection are “strategy_configs” which are actually a list of dictionaries, as this is a collection of multiple configs)

Start the processing of your data:

Processing of your data always works in the same way. You need to choose which processing strategy (or strategies) you want to run, and have the option to specify some general processing configurations (like which files you’d like to process, whether progress shall be autosaved, ..) and strategy-specific configurations (where applicable). You will find a list of all available processing strategies in the corresponding “strategies” submodule of the respective processing module (e.g. in findmycells.preprocessing.strategies). Here, we will keep things simple and we will only run two preprocessing strategies, both with their respective default values (i.e. not passing any “strategy_configs”)

from findmycells.preprocessing.strategies import CropToROIsBoundingBoxStrat, ConvertTo8BitStrat

You can either read through the docs webpage to know what which strategy does, or read their dosctrings:

?CropToROIsBoundingBoxStrat
Init signature: CropToROIsBoundingBoxStrat()
Docstring:     
You might not be interested in analyzing the entire image, but only to quantify
image features of interest in a certain region of your image (or actually also
several regions). Now, chances are that it is possible to find a bounding box that
contains all regions of the image that you are interested in, which is, however,
smaller than the original image. Cropping your original image down to that smaller 
size will then significantly reduce computation time, required memory space, and also
required disk space. Therefore, it is highly recommended to add this strategy to your
preprocessing. You can also combine it with additional cropping strategies, like the
one that tries to remove stitching artefacts.
File:           ~/GitHub_repos/findmycells/findmycells/preprocessing/strategies.py
Type:           ABCMeta
Subclasses:     
?ConvertTo8BitStrat
Init signature: ConvertTo8BitStrat()
Docstring:     
This strategy converts your image to an 8-bit format. Adding this strategy is
at the moment mandatory, as all implemented segmentation tools (deepflash2 & cellpose)
require 8-bit as input format. So you actually don´t really have a choice but adding it! :-)
File:           ~/GitHub_repos/findmycells/findmycells/preprocessing/strategies.py
Type:           ABCMeta
Subclasses:     
fmc_cfos_project.preprocess(strategies = [CropToROIsBoundingBoxStrat, ConvertTo8BitStrat])
Note

If you don’t provide any file IDs, the processing will be run on all file IDs by default (taking into account the overwrite argument in the project_configs, which is False by default).

Warning

All other processing steps in findmycells work the same way. However, the next processing step (i.e. segmentation) would require trained deepflash2 models. Since these files are too large to be appropriately hosted in this GitHub repository, we cannot continue here any further. Feel free to get in touch, though, if you have trouble with running the other processing steps!

Saving & loading your project:

To avoid that you lose any progress and can always come back and continue your findmycells project, you can easily save & load the status of your project:

fmc_cfos_project.save_status()
fmc_cfos_project.load_status()

File history:

findmycells also keeps track of how you processed your files. These information can be retrieved accessing the file_histories attribute of the database:

file_id_of_interest = '0000'
fmc_cfos_project.database.file_histories[file_id_of_interest].tracked_history
processing_step_id processing_strategy strategy_finished_at
0 preprocessing CropToROIsBoundingBoxStrat 2023-02-20 20:33:38.766123
1 preprocessing ConvertTo8BitStrat 2023-02-20 20:33:38.769788

Continue with whole test dataset, including pretrained models:

We provide a full test dataset, including trained deepflash2 models, on the related Zenodo repository. You can easily download it using the corresponding utility function:

from findmycells.utils import download_sample_data

# Please specify a path to an emtpy but existing directory on your local machine:
destination_dir = Path('/add/your/path/here')

# Simply uncomment the following line to download the full test dataset. The download may take some minutes
# download_sample_data(destination_dir_path = destination_dir)