End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education

Orhei, Ciprian; Vert, Silviu; Mocofan, Muguras; Vasiu, Radu

doi:10.3390/s21113691

Open AccessArticle

End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education^†

Department of Communications, Politehnica University of Timișoara, 2, Piata Victoriei, 300006 Timișoara, Romania

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Orhei, C.; Mocofan, M.; Vert, S. and Vasiu, R. “End-to-End Computer Vision Framework,” 2020 International Symposium on Electronics and Telecommunications (ISETC), Timisoara, Romania, 2020, pp. 1–4, doi:10.1109/ISETC50328.2020.9301078.

Sensors 2021, 21(11), 3691; https://doi.org/10.3390/s21113691

Submission received: 23 April 2021 / Revised: 18 May 2021 / Accepted: 22 May 2021 / Published: 26 May 2021

(This article belongs to the Special Issue Selected Papers from the International Symposium on Electronics and Telecommunications ISETC 2020)

Download

Browse Figures

Versions Notes

Abstract

:

Computer Vision is a cross-research field with the main purpose of understanding the surrounding environment as closely as possible to human perception. The image processing systems is continuously growing and expanding into more complex systems, usually tailored to the certain needs or applications it may serve. To better serve this purpose, research on the architecture and design of such systems is also important. We present the End-to-End Computer Vision Framework, an open-source solution that aims to support researchers and teachers within the image processing vast field. The framework has incorporated Computer Vision features and Machine Learning models that researchers can use. In the continuous need to add new Computer Vision algorithms for a day-to-day research activity, our proposed framework has an advantage given by the configurable and scalar architecture. Even if the main focus of the framework is on the Computer Vision processing pipeline, the framework offers solutions to incorporate even more complex activities, such as training Machine Learning models. EECVF aims to become a useful tool for learning activities in the Computer Vision field, as it allows the learner and the teacher to handle only the topics at hand, and not the interconnection necessary for visual processing flow.

Keywords:

Computer Vision Framework; Computer Vision; pipeline architecture; benchmarking; deep learning; neural networks; reproducible research; machine learning

1. Introduction

Computer Vision (CV) is an interdisciplinary field which deals with understanding digital images or videos, as well or even better than humans do. The main tasks of CV are acquiring, processing, analyzing, and understanding the environment through digital images [1]. Some high-level problems which are successfully tackled by CV are optical character recognition (OCR) [2,3,4], machine vision inspection [5,6], 3D model building (photogrammetry) [7,8], medical imaging [9,10], automotive safety [11], motion capture, surveillance [12], fingerprint recognition [13,14], face recognition, and gesture recognition [15,16,17].

As the CV topics evolved over the time (see Figure 1), the complexity of the required architectures did too. Pipelines can include image acquisition from image sensors; preprocessing to enhance the image such as reducing noise; selection of region of interest such as background subtraction; feature extraction that would reveal lines, edges, shapes, and textures; high-level processing relevant to the application; and finally, decision-making such as classifying an object as being a car [1,18,19].

Complex open research topics from the CV domain are solved nowadays using Machine Learning (ML), Neural Networks (NN), or Deep Learning (DL) techniques. These techniques solve problems by learning patterns from a huge amount of data and retaining the gained knowledge into model structures and weighted parameters with the aim of solving trivial or corner case challenges [20,21].

This natural blend of Artificial Intelligence (AI) in solving complex CV topics is evidenced in the evolution of increased number of AI papers on arXiv, overall, and in a number of subcategories. In Figure 2, the graph shows the number of AI papers on arXiv by each paper’s primary subcategory, and we can observe that the increase is constant in the AI domain and CV domain.

Considering the long-term evolution of CV systems and increasing number of applications, multiple implementation solutions were used to obtain the desired pipelines. Optimization of the CV system may vary by system and considers multiple parameters in doing so. A low-power such system for mobile phones may have a limited GPU and CPU SIMD instruction set which will cause a thinning of the pipeline. However, in any case, the modern CV pipeline is a blended system that include AI elements, such as ML or DL, and classical image processing activities, as we can see in Figure 3.

Even if the programming language is just a tool for implementing the means of the image processing algorithm, choosing the best fitted one is not a trivial task. Usually this choice is dictated by environment conditions, such as the hardware in which the application will run, the purpose of the application, the sensors it needs to interact with and so on.

We propose a Python-based CV framework, the End-to-End CV Framework (EECVF) (https://www.cm.upt.ro/projects/eecvf, accessed on 22 April 2021 ) [19,23], whose purpose is to run custom pipelines designed by the user. The open-source framework can be found in the GitHub repository https://github.com/CipiOrhei/eecvf (accessed on 22 April 2021), under the MIT License.

EECVF provides users with built-in CV algorithms for image manipulation, image smoothing, edge detection, line detection, and semantic segmentation. Even if certain jobs are provided by default, users will discover that adding new features is a facile task in our proposed framework. The provided features are developed using Python based open-source libraries. To be able to respect the current CV pipeline requirements, the framework offers ML models for edge detection and semantic segmentation that users can retrain as needed. The framework has a separate block that handles AI activities, in which users can find the existing models or add their on. We use the generic AI term because, from an architectural point of view, the inner details of the model used (NN or ML) are not visible outside (to the user or to other blocks).

The term “End-to-End” points out the ability of the framework to execute multiple steps of a CV process as a one-click solution. The framework has the capability to handle the complete cycle of create–train–evaluate of an AI model, to run a CV application using the model, and in the end, to evaluate the results and plot them without any user intervention. The assumption of EECVF is that tuning the model, running the CV application, and evaluating the results do not run concurrently. All these steps are accompanied by the debug information that is requested by the user.

EECVF has proven to be a useful software tool for day-to-day research and teaching activities. Using EECVF will shift the attention from developing and maintaining the interaction between software components, pipeline manipulation, and tooling to the CV research itself. The modularity and scalability of the framework will facilitate the continuous development of concepts that we consider will keep EECVF up-to-date in the ever changing CV world.

This work is a extension of our presentation of the EECVF from in [19] and it is structured as follows. In Section 2, we present similar frameworks that were found in the scientific literature and a comparison between them. In Section 3, we describe our proposed framework, EECVF, followed by a use case example in Section 4. In Section 5, we will present the benefits of using this framework in learning activities, and in Section 6, we draw conclusions upon our work.

2. Related Work

In this section, we present similar frameworks that we found in the research literature and a comparison is summarized in Table 1. Our analysis highlights the main points of each entry, respective, and the year and programming language they were development in.

The presented state-of-the-art review focuses mainly on frameworks that can handle more generic applications of the CV pipeline, even if they are focused only on the ML or DL part. We excluded frameworks that only focus on one use-case to solve or handle a certain architecture.

All the frameworks described in this section, even our EECVF, use external image processing toolboxes or libraries but these are two different aspects from our point of view. Libraries are useful resources and have been developed over time on several programming languages but they focus on solutions for certain problems and less on the overall system view. Frameworks, compared to libraries, should concern more with the big picture of the CV pipeline flow. Example of popular libraries are MatWorks [24] (for Matlab or C/C++), OpenCV [25] (for Python, Java or C++), CImg [26] (for C++), MatlabFns [27] (for Matlab or Octave), and OpenAI [28] (for Python).

Table 1. Comparison of different CV frameworks.

Framework	Year	Language	Main Points
WEKA [29]	1994	C	Focused on ML and Data Mining algorithms
		C++	Open source
			GUI for developing ML data flows
WEKA3 [30]	2009	Java	Focused on ML and Data Mining algorithms
			Open source
			GUI for developing ML data flows
Rattle [31,32]	2011	R	Focused on ML and Data Mining algorithms
			Open source
			GUI for developing ML data flows
DARWIN [33]	2012	C++	Open source
			CV and ML algorithms
			Python wrapper
			GUI for developing ML data flows
FIJI [34]	2012	Java	Open source
			CV algorithms
			Biological-image analysis specific
			Plug-in to Matlab, ITK and WEKA
eMZed [35]	2013	Python	Open source
			Specialized for chromatography (LC)/MS data analysis
			GUI for non-programmer users
AVS [36]	2015	C/C++	Open source
			CV algorithms with clear presentation for teaching
			GUI for non-programmer users
HetroCV [37]	2016	C/C++	Programmer-directed auto-tuning framework
			CV and ML algorithms on CPU-MIC platform
			Focused on run-time on HW optimizations
DeepDiva [38]	2018	Python	Accessible as Web Service through DIVAServices
			Open source
			DL and ML algorithms
Chainer [39]	2019	Python	Deep Learning Framework for Accelerating the Research Cycle
			Open source
			CV and ML algorithms

Waikato Environment for Knowledge Analysis (WEKA) is not a single program, but rather a set of tools bound together by a common user interface. For designing the interfaces, they have taken the view that a tool will ultimately reside alongside other end-user applications such as spreadsheets, word processors, and databases [29]. The WEKA original framework was developed in C and C++ but the latest version WEKA3 is developed in Java now used in many different application areas, in particular for educational purposes and research [30].

The R Analytical Tool To Learn Easily (RATTLE) is a graphical data mining application written in R language. The main goal of this framework is to ease the transition from basic data mining to sophisticated data analyses using a powerful statistical language. Rattle’s user interface provides an entree into the power of R as a data mining tool [31,32].

The DARWIN framework is twofold system based on C++ programming language that aims to provide infrastructure for students and researchers to experiment with and extend state-of-the-art methods. They provide infrastructure for data management, logging and configuration with a consistent and well documented interface for machine learning practitioners [33].

Fiji is an open-source software framework that focuses on image analysis in biology. The framework combines powerful software libraries with a range of scripting languages to enable rapid prototyping of image processing algorithms. Fiji is a distribution of the popular open-source software ImageJ [34].

The eMZed is Python-based framework tailored for mass spectrometry users who want to create tailored workflows for liquid chromatography and data analysis. The framework specifically addresses non-expert programmers with the goal to establish a comprehensive list of basic functionalities [35].

Adaptive Vision Studio (AVS) is a software tool for creating image processing and analysis algorithms. This framework has been tested on a post-graduate computer vision course from Automatic Control and Biotechnology at Silesian University of Technology. AVS has proven to be a powerful environment with ready-for-use image analysis filters for computer vision experts and beginners [36].

HetroCV is an auto-tuning framework and run time for image processing applications on heterogeneous CPU-MIC platforms. In HetroCV, the image processing pipelines is composed by computation units like Map, Stencil, and MapReduce. The main benefit of this framework is the fact that it uses program statistics extracted from the computation units to predict the optimal tuning parameters on-line [37].

The DeepDIVA framework is designed to enable a quick and intuitive setup for experiments that should be reproduced for analysis. The framework offers functions to keeping track of experiments, hyperparameter optimization, and visualization of data and results [38].

Chainer framework is a flexible, intuitive, and high-performance means of implementing the full range of deep learning models for researchers and practitioners. The framework provides acceleration using GPU with familiar Python libraries [39].

From our literature review, we can conclude that no framework specifically handles the full pipeline. For example, we can imagine a use case where we desire to train a specific edge detection ML model, use it inside a CV application together with a classical edge detection algorithm and evaluate the edge results at the end.

Furthermore, most of the presented frameworks focus more on the ML algorithms and provide tools to understand the inner workings of the models. Chainer [39] even goes one step forward and focuses on the way one can optimize the ML pipeline on an specific accelerator.

An interesting aspect is the fact that recent framework solutions have chosen Python as the main programming language. The main benefit for this selection is the facile interaction with the OS environment (Linux or Windows) and the fact that a considerable amount of libraries exist with ML, DL, or CV algorithms.

3. Proposed CV System

EECVF is an easy to use, modular, and flexible framework designed for researching and testing CV concepts. The framework does not require the user to handle the interconnections throughout the system. The users do not need to concern themselves with the strategies they need to use for transferring data (input or processed one) from a

j o b

to the next or from a block to another (e.g.,

A p p l i c a t i o n b l o c k

to

B e n c h m a r k b l o c k

). We consider the framework easy to use because the system offers the user

j o b s

and

s e r v i c e s

to configure the desired pipeline.

EECVF is constructed in a modular programming software design fashion, with all functional components being independent. This was a relevant aspect when we constructed the framework, as we desired to allow the users to use just one block or several blocks in their desired activities.

All the components, high-level blocks or low-level

j o b s

, respect the concept of data coupling. We aimed to have a loose coupling between software elements so the dependencies would be minimal and the data flow slim. This aspect of construction helped the functionality of the EECVF to be scalable. New features,

j o b

or

s e r v i c e

, can be easily integrated in the framework by users, without any need for refactoring, changing or adapting any existing features or concepts inside.

Flexibility of a software system can be defined as how easy it is to reuse or increase the functionality of classes or modules in similar contexts without the need of redesign or modification of existing architecture [40]. The proposed framework is constructed upon this concept. EECVF high-level architecture easily permits adding new

j o b s

in any of the component blocks without the need for modifying the existing modules, except the interfaces of the blocks to expose the new

j o b

to other blocks and users. This concept is enforced by the fact that a certain functional block, that represents a

j o b

, is used to create new

j o b

, without any modification in the used

j o b

.

In recent years, Python—an interpreted, high-level programming language—became the de facto standard for exploratory, interactive, and computation-driven scientific research [41]. We chose Python because of its capabilities to interconnect multiple blocks of our environment and smooth switching between operating systems. To make EECVF more facile, the users can just run the

s e t u p_f r a m e w o r k

Python module, which will install all the requested libraries and dependencies.

Another aspect for choosing Python as the main programming language for EECVF is the capability of it to interconnect sensors to the system easily. We considered this aspect as important in the construction of EECVF as we desire to permit sensors to inject input data streams directly in the pipeline. As a naive example user can find the

e x a m p l e_m a i n_c a m e r a_v i d e o

module where the pipeline is configured to obtain data directly from a video camera connected to the system.

3.1. High-Level View

One of the desired outcomes of the framework is to unify different stages of the vast CV research domain. Figure 4 shows the blocks forming EECVF. Treating all the blocks of the pipeline as one component, we reduce the number of redundant operations and calculations by eliminating duplication of data and interfaces throughout the system.

The framework is following a Facade design principle; the

U s e r i n t e r f a c e b l o c k

acts as a Facade to the entire application. Facade principle states that a complex subsystem should provide an interface which limits interactions with lower layers of architecture, offering a simple channel of communications between user and software functionality [42]. In the proposed framework, the user is not required to interact with the second layer (

A I

,

A p p l i c a t i o n

,

B e n c h m a r k

, or

D a t a P r o c e s s i n g

) in order to use the application. Every

j o b

or

s e r v i c e

provides a method with the scope of abstracting from the user the necessary sub-system actions needed in a block for execution.

J o b s

and

s e r v i c e s

from each block are exposed to the rest of the system via interfaces. This mechanism isolates the inner works of each element from the user and between the elements themselves. As such, the users would only need to focus on the research topic at hand and not on the tools they have to use.

To understand how EECVF inner architecture works, we need to define the terms:

a $j o b$ is an action with an added value for the CV pipeline or AI module, typically an output;
a $s e r v i c e$ is an action that ensures the proper functionality and configuration of the framework;
a $p o r t$ is defined as a channel for the data that is passed between jobs or blocks of the framework; and
a $w a v e$ is a full execution of the pipeline for one frame of the input data.

Every

j o b

offers a public method exposed to the user which can be configured via parameters. This method handles the necessary changes in the system configuration and triggers other necessary

j o b s

from one or several blocks, depending on the nature of the

j o b

. As we can see in Figure 5, besides the method for user interface, a job can have multiple private methods that handle the functionality, with or without external python libraries.

Users can opt for only one of the blocks or for several of them. For example, only the

A p p l i c a t i o n b l o c k

to run a simple CV pipeline, or the

A p p l i c a t i o n b l o c k

and the

B e n c h m a r k b l o c k

to execute and evaluate the results.

3.2. AI Block

This block of the framework handles the training and customization of ML, NN, or DL models. An overview of the block can be seen in Figure 6. With this block, EECVF desires to isolate this specific activities from the CV pipeline. This separation is needed because these are made prior of any usage of the model in any pipeline.

In this block, users can find

j o b s

that will trigger ML semantic segmentation models or edge detection models that can be trained according to desired pipeline needs.

J o b s

from this block respect the principle presented in Section 3.1, and so the user can easily configure the training using method parameters.

Users can define their own AI models or they can use third-party models already configured in EECVF. Due to the vast model variants and use case variants, it will be the user’s responsibility to define the model and the inner workings of the training process inside the block.

EECVF will provide jobs that will help the interconnection between the

A I b l o c k

and the other blocks. An example of such an interconnection would be to set up a

j o b

in the

A p p l i c a t i o n b l o c k

that uses the model checkpoint output files that were generated after training. Another

j o b

that can be exported to a different block of the framework is the validation of models. Validation can be done inside this block or using the

B e n c h m a r k b l o c k

.

For the augmentation of data we can use the

A p p l i c a t i o n b l o c k

. By using

s e r v i c e s

provided by EECVF, we can set the data exported as input to our

j o b s

in the

A I b l o c k

. Of course, users can configure the desired

j o b s

in this block to use directly the augmentation function offered by Python libraries but, for a better observation of the augmentation, we use the

A p p l i c a t i o n b l o c k

.

A benefit of EECVF is the fact that it is not dependent on any particular library for this block. Users can integrate models constructed with any library as long as it is Python-based or provides a Python wrapper.

3.3. Application Block

The

A p p l i c a t i o n b l o c k

handles the CV pipeline in the framework. This block configures the actual order of execution for all triggered

j o b s

and

p o r t s

that the user describes in the attempt to simulate a use case. The internal design of the

A p p l i c a t i o n b l o c k

is presented in Figure 7. To set up a pipeline, users need to configure

j o b s

with input and output

p o r t s

, with the input data (images, videos, or any vector data format) and with the

s c h e d u l e

for the pipeline.

Like stated before, we constructed the proposed framework to be modular, scalable, and flexible. To enforce this concept, the

A p p l i c a t i o n b l o c k

has processes that attempt to remove duplicate operations, schedule

J o b s

, and execute algorithms only when needed.

The

P r o x y

design pattern means that a resource is loaded only when it is needed. The

A p p l i c a t i o n b l o c k

follows this approach when executing

j o b s

and

s e r v i c e s

later in the process [42]. This technique allows users to specify what computations they want to make (like

d o_g a u s s i a n_b l u r_i m a g e_j o b

,

d o_c a n n y_f i x_t h r e s h o l d_j o b

, or

d o_e d_l i n e s_j o b

) and lets the

A p p l i c a t i o n b l o c k

to schedule them after removing duplicates or missing

p o r t s

, creating a slim pipeline of CV operations. The actual

j o b

execution process comes at the final stage of the entire sub-process.

An important intermediate step in the construction of the pipeline is the

J o b p a r s i n g

algorithm. This algorithm avoids duplicate

j o b s

and prepares the system for the scheduling phase. Like we showed in Figure 7, this is an important step prior to running the pipeline. The output of

J o b p a r s i n g

is a JSON file which contains the slimmest list of possible

j o b s

and

p o r t s

to be used in the pipeline. Another aspect that is determined at this point in time is the priority of

j o b s

. This is done according to the availability of the data as input.

We can observe in Algorithm 1 the inner workings of the job parsing algorithm. The algorithm aim is to sort all the

j o b s

in the pipeline, such as no

i n p u t p o r t

is missing for a

j o b

. Therefore, doing the clustering of processed

j o b s

(the one with input

p o r t

allocated) and unprocessed

j o b s

(the ones without input

p o r t

allocated) will result in inactivating any

j o b

with missing inputs. The algorithm stops when, during two consecutive iterations, the number of unprocessed

j o b s

does not change. The second part of the algorithm is a verification on the fields (processing level, active, name, input ports, output ports, init function, main function, and so on) of the

j o b s

, so we avoid duplicate execution of them. This logic is valid even for a huge amount of

j o b s

and assures future optimization of the pipelines.

The resulting json file from the

j o b_p a r s i n g

is the description of the pipeline with each job, input ports, output ports, initialization functions, and run functions.

The initial phase of the Job Parsing algorithm is important because it sets the

j o b s

in an ordered fashion so we can avoid that a

j o b

is to be executed before it has the desired

p o r t s

as input. Next, the algorithm assures that all job that have the necessary inputs can run. The next step is also important because it will eliminate duplicated

j o b s

. Eliminating duplicated

j o b s

does not take in consideration the

j o b

name given by the user but the input/output

p o r t s

and initialization and run function of each

j o b

.

Algorithm 1 Job Parsing Algorithm

1:: Find input $j o b$ ▹ Job with no input, only output $p o r t s$
2:: Process_lvl = 1 ▹ $j o b s$ which process input data have process_lvl = 0
3:: while $j o b$ to process do
4:: Cluster processed $j o b s$
5:: Cluster unprocessed $j o b s$
6:: for $j o b$ unprocessed cluster do
7:: if $i n p u t_p o r t s$ of $j o b$ found in $o u t p u t_p o r t s$ from processed cluster then
8:: Add Process_lvl to $j o b$
9:: end if
10:: end for
11:: Increment Process_lvl
12:: end while
13:: Set unprocessed $j o b s$ as inactive
14:: Sort $j o b s$ by Process_lvl ascending
15:: for Process_lvl in $j o b s$ do
16:: if duplicate $j o b$ then ▹ $j o b$ name is not considered in this verification
17:: Set $j o b$ as inactive
18:: end if
19:: end for
20:: Write active $j o b s$ to JSON file

Removal stage done early in the process leads to a

F l y w e i g h t

design approach. Sharing common parts between multiple CV jobs keeps RAM usage as low as possible plus an increase of overall runtime due to a lack of duplicate operations [42]. For instance, when running three jobs (e.g., CV algorithms), each needs to compute a Gaussian Filter, and only one computation of Gaussian algorithm is actually done. Of course, this depends on the level of granularity that the functional implementation offers.

Every

j o b

from the

A p p l i c a t i o n b l o c k

has three phases: initialization, run, and termination. The initialization phase for each

j o b

is executed once in the beginning of the pipeline. For each

w a v e

the run phase of

j o b s

is executed accordingly to the selected scheduler. An important aspect in this step is the mechanism that avoids the duplication of

p o r t s

, mechanism that becomes more important as the quantity of data increases. At the end of the pipeline, the termination phase is triggered for each

j o b

.

In Algorithm 2, we try to describe the flow of a generic

A p p l i c a t i o n

algorithm. As we can observe, the main function of this block is more focused on the execution environment than on the functionality that is executed. This is an important aspect for the flexibility of this block directly, and for EECVF indirectly.

We can observe from Algorithm 2 that the debug information (e.g., execution time of

j o b s

, debug data, and

p o r t

listing) is handled for each

w a v e

, on one hand, and for the whole CV pipeline, on the other hand. The actual execution of the

j o b s

is configured by the Scheduler that takes in a list of

j o b s

.

Algorithm 2 Application Algorithm

1:: Start $t i m e r s$ , $l o g g e r s$
2:: if $i n p u t$ flow exists then
3:: Configure $A p p l i c a t i o n$ accordingly to input source
4:: end if
5:: if $c o n f i g_j s o n$ exists then
6:: for $j o b$ in $j o b_l i s t$ do
7:: Create job
8:: for $p o r t$ of $J o b$ do
9:: if $p o r t s$ not exists then
10:: Create $p o r t$
11:: end if
12:: Link $J o b$ to $P o r t$
13:: end for
14:: end for
15:: for $j o b$ in $j o b_l i s t$ do
16:: Run $J o b . I n i t ()$
17:: end for
18:: while $w a v e$ do
19:: $S c h e d u l e r$ ( $j o b_l i s t$ )
20:: Save debug data, $p o r t s$
21:: end while
22:: for $j o b$ in $j o b_l i s t$ do
23:: Run $J o b . T e r m i n a t e ()$
24:: end for
25:: end if

In Figure 8, we present the sequence diagram for the execution of

C V A p p l i c a t i o n

. As stated before, we can observe that the

J o b P a r s i n g

is an important activity in the process, and the fact that the application cannot run without the json file provided by it. We need to differentiate the create_job phase from the init_job phase. Create_job phase refers to the creation of the

j o b

object with the equivalent output ports and other inner attributes, while the init_job refers to the initialization of the functionality that the

j o b

offers. Init_job and run_job interact with the

p o r t

every call even if this is not showed in the diagram. The handling of all the ports is done strictly by a port_handle component; this isolation assures the integrity of the data through

A p p l i c a t i o n b l o c k

.

The error handling responsibility inside the

A p p l i c a t i o n b l o c k

is divided between the hidden methods of a

j o b

, that should protect against exception that occur in processing, and the block frame together with the scheduler. There are several mechanisms to protect against the execution failing of the

A p p l i c a t i o n b l o c k

. Every

p o r t

has a validity attribute that is set by

j o b s

when they are filled after processing; if the attribute is false, this will cause that all the

j o b s

that consider the

p o r t

as input to not execute the current

w a v e

. Every

j o b

checks, after initialization, that the necessary data and input

p o r t s

exist; if this is not the case, the

j o b

will be eliminated from the pipeline and the depending

j o b s

will not execute. If an error occurs in the run phase of a

j o b

, this will be considered corrupted and will be skipped and logged accordingly.

Statistics data from

A p p l i c a t i o n b l o c k

are generated automatically for each

w a v e

using

D a t a P r o c e s s i n g B l o c k

. Statistics are generated for each

j o b

too accordingly to the nature of it and configuration. The minimum data logged for each

j o b

are the output image and run time analysis. Of course the user can configure the quantity of logging data needed for the application.

In a sense, this block can be considered the main block of the framework as CV pipeline is created, customized and executed inside it.

3.4. Benchmark Block

The

B e n c h m a r k b l o c k

handles the evaluation and validation of data from the EECVF. In this block, users can select from several benchmark algorithms to evaluate their results. Typically, this block is used by users after a CV pipeline was executed by the

A p p l i c a t i o n b l o c k

to obtain necessary metrics.

In our opinion, this block does not need to have a parallel design because, typically, we do not evaluate at the same time that we are running an application pipeline. The block is able to run multiple evaluations for one application. In Figure 9, we present the inner design of the block. The output of the evaluation on multiple sets of data is automatically saved in the results for each set of

p o r t s

.

A limitation of using this block that we consider mentioning is the fact that the

j o b s

inside this block cannot be integrated with

A p p l i c a t i o n b l o c k

j o b s

. This limitation exists purposefully because the EECVF environment considers that benchmark is done post-CV pipeline execution.

Using

D a t a P r o c e s s i n g b l o c k

, we can visualize and plot the results of the

B e n c h m a r k b l o c k

for the data we generate using the

A p p l i c a t i o n b l o c k

. Similar to other blocks, the evaluation algorithms that are provided by this block are directly dependent on the domains where EECVF will be used.

3.5. Data Processing

The

D a t a P r o c e s s i n g b l o c k

is the block that handles the “communication” between users and EECVF. By “communication”, we understand the exchange of data between the framework and the user. This can be exemplified as saved images, tables, plots, or other metrics.

Users can select one of several data manipulation services that can create plots or metrics from the statistics saved by the other blocks. By default, a limited series of data is saved by the EECVF, and they can be found in the Logs folder. The

D a t a P r o c e s s i n g b l o c k

has the scope of helping the interactions between users and EECVF and it is used in all the other blocks of the system.

Even if this block does not have an internal design structure, we consider it to be one of the most important ones because all the rest of the blocks depend heavily on it. This block handles the logging mechanism, which is an important feature for our framework. Logging is important for a research-based framework which focuses on offering useful information to the users rather than running quick and silent.

3.6. Job Adding

Like we stated before, adding new functionalities to the framework is a facile task because of the architecture chosen when we developed it. All functionalities of the EECVF are to be found under the so-called generic term of

j o b

. In the following, we will attempt to explain how to add a new

j o b

to the

A p p l i c a t i o n b l o c k

. We chose this block because it has a more specific structure needed for

j o b s

so the main execution loop of the pipeline will be able to use it.

All the needed functionality should be encapsulated in one Python module, to respect the modular principle we imposed on the framework. The new

j o b

should offer three public methods, so it can link to the framework: a user-interface method, a initialization method and a run method. A template for the public methods of a

j o b

is offered in

A p p l i c a t i o n / J o b s / j o b_t e m p l a t e . p y

.

The user interface method should configure the transition between user interface and the

A p p l i c a t i o n b l o c k

and describe the configuration of the

j o b

. As a user can see in the template, the method should have a series of mandatory parameters: name of input

p o r t

with respective

w a v e

(we can process an input from a past wave) and

l e v e l

(pyramid level or custom); name for one or several output ports and parameters that are needed for the actual function.

Inside the user interface

j o b

method, users should create lists of input and output ports with a specific format. For transforming the port name and size, we offer specific

s e r v i c e s

which one can use for creating the name and size of the port (

t r a n s f o r m_p o r t_n a m e_l v l

and

t r a n s f o r m_p o r t_s i z e_l v l

). Another important aspect that this method handles is to specify the initialization function and run function for the new

j o b

and configure the list of parameters that those two functions should receive.

When adding a new

j o b

, if users would like to use an existing

j o b

, as inner steps of the new functionality, they should configure them in the user interface method. This is recommended because it will help the framework to maintain a slim pipeline and not have different

j o b s

that do the same functionality.

Both

i n i t_f u n c t i o n

and

m a i n_f u n c t i o n

for the new

j o b

should handle the integrity verification of the used ports and avoid crashing due to exceptions on run-time. These basic aspects are covered if user respects the offered template for this public methods.

After the three public methods are constructed, the user only needs to add the user interface method to the interface of the

A p p l i c a t i o n b l o c k

(

A p p l i c a t i o n /__i n i t__. p y

) and the new module to the package init module (

A p p l i c a t i o n / J o b s /__i n i t__. p y

).

The adding of new

j o b s

remains the same for other blocks, with the benefit of needing only one public method that represents the functionality interface with EECVF. Another important aspect to take in consideration is that, if external libraries or repositories are used when adding new jobs, dependencies have to be added to the

r e q u i r e m e n t s . t x t

file. This is important for future users to be able to trigger the new

j o b s

.

4. Example Use Case

In this section, we will consider an example of an application that will attempt to present the benefits of using EECVF for day-to-day research. We know that research is a methodical activity but not always clean in the incipient phases, a fact that we like to include in our experiment. Because we desire to present the benefits that our framework can bring for research work, we will not focus on fine tuning the models or cleaning up the pipelines for better results. Rather we will complicate the pipeline with paths for presenting more results.

The example that we present in this section shows how to configure the EECVF to train several ML models, how to use the training output into a CV pipeline and how to evaluate the results at the end. All the

j o b s

that are used in this example are available for users by default in the framework. One can reproduce the example by running

m a i n_e e c v f_j u r n a l . p y

from the framework. All the necessary dependencies (libraries, sub-repositories) are installed when running the

s e t u p_f r a m e w o r k . p y

module.

We consider an experiment where we like to determine the best edge detector for urban scenario when the images are segmented prior of edge map processing. To do so, we consider two datasets specialized for this scenario: LabelMe Facade [43,44] and TMBuD [45]; two semantic segmentation models: U-Net [46,47] with VGG16 [48] encoder and SegNet [49] with ResNet-50 [50] encoder; and Canny [51], Shen-Castan [52], and Edge Drawing [53] edge detection algorithms.

The complete pipeline is presented in Figure 10. For a better visual understanding, we used several annotation like VGG U-Net represents the model weights resulting by VGG U-Net training and ResNet-50 SegNet represents the weights obtain by ResNet-50 SegNet training; *1 represents the edge detection algorithms used; *2 represents the part of the pipeline in which we apply the segmentation, group the resulting semantic classes into foreground and background, intersect the binary map with the image, and apply the edge detection block algorithms; and *3 represents the block of the pipeline where we apply block *2 on the resulted image after smoothing it with Bilateral Filter [54] and Anisotropic Filter [55]. In the end of the pipeline, we will evaluate using Intersection over Union or Jaccard index (IoU) for the semantic segmentation results and the popular Figure of Merit (FOM) [56] and Correspondence Pixel Metric (CPM) [57] for the edge maps.

The selected complex pipeline, which runs over several pyramid levels, was chosen to highlight the benefits of the framework. As we will see in this section, the user triggers in the final pipeline a number of 752

j o b s

and saves only 92 from the 902

p o r t s

that the application constructs. From our point of view, the fact that a user can describe a pipeline (see Figure 10) and configure which

p o r t s

to save and which to evaluate demonstrates the fact that the framework is easy to use.

As we can see in Figure 10, we would evaluate the results when the pipeline uses the original image size, Pyramid Level 0, but we would like to see the effects upon the edge map resulted if we process one level lower in the pyramid and reconstruct the edge map back to the original level. Reconstructing features obtained in lower pyramid levels is a common practice in CV domain [58,59].

Another aspect we would like to present in our experiment is the EECVF capability of error handling. To do so, we have set the pipeline to run for Pyramid level 2 even if the

d o_p y r a m i d_l e v e l_d o w n_j o b

has been given the parameter

n u m b e r_o f_l v l = 1

. This will cause the EECVF to discard in the preprocessing part of

A p p l i c a t i o n

the entire sub-pipeline. With this intentional fault we wish to exemplify the framework’s capability to handle errors. The most important aspect for a software system like this is the capability to handle faults in the system without stopping the execution.

In Figure 11, we present the dataset used in our example: LabelMe Facade dataset for training the semantic segmentation models and TMBuD dataset for evaluating the semantic segmentation output and edge maps resulted at the end of the pipeline. Of course, the fact that the two dataset have different perspectives and classes will cause some negative effects on the results but in CV application it is common to use several datasets.

In Figure 12, we can observe how to set up the augmentation of data for learning of semantic segmentation and how to split the data. We can see in Figure 13 an example of the data augmentation done accordingly to the desired configuration. Using EECVF, we can better observe the effect of the augmentations done to our data, which can be an important benefit when trying to understand the chain of effects.

As stated before, because of the modular implementation of the framework, we can use

j o b s

from one block in other blocks to obtain the desired functionality. In this example, we are using the

A p p l i c a t i o n b l o c k

for data augmentation rather than using the augmentation options offered by Python libraries used for training. We consider that using the EECVF in this way, the researcher better understands the flow of data into the system.

To be able to use the datasets together, we need to correlate the annotation of classes between them. In Figure 12, we can see that the

d o_c l a s s_c o r r e l a t i o n

j o b

is used for this task. The resulting images are further used in the training process.

We observe in Figure 12 that for preparing the data for the training, we undergo a series of transformations with the scope of enriching the training dataset. We start by changing the resolution of the image to 320 × 520 using the

d o_r e s i z e_i m a g e_j o b

. Afterwards, on the resulting resized image, we apply several augmentations like flipping (using

d o_f l i p_i m a g e_j o b

), zooming (

d o_z o o m_i m a g e_j o b

), rotation (

d o_r o t a t e_i m a g e_j o b

), add motion blur (

d o_m o t i o n_b l u r_f i l t e r_j o b

), and so on. To configure the linkage of

p o r t s

between

j o b s

we just need to add ’RAW_RESIZE’ value to the

p o r t_o u t p u t_n a m e

parameter of

d o_r e s i z e_i m a g e_j o b

and then use the same value for the

p o r t_i n p u t_n a m e

of the

j o b s

we use for augmentation.

In Figure 14, we can see how the VGG-Unet and ResNet-SegNet are configured to be trained. The VGG-Unet is trained with 8 classes, on 70 epochs with a batch size of 8, with 20 steps per epoch on training data and 58 steps for validation data. Similarly, we set up the training of the ResNet-SegNet with 70 epochs with a batch size of 4, 58 steps per epoch on training data, and 117 steps for validation data. Of course, we can fine-tune extensively the networks but, for this example, we consider that it is good enough. The

d o_s e m s e g_b a s e

j o b

provided has multiple models, already provided by the framework, which can be configured using the

m o d e l

parameter.

Using

s e t_i m a g e_i n p u t_f o l d e r

,

s e t_l a b e l_i n p u t_f o l d e r

,

s e t_i m a g e_v a l i d a t e_f o l d e r

and

s e t_l a b e l_v a l i d a t e_f o l d e r

s e r v i c e s

we configure the framework to use the images outputted in the preparation activities. We use the

d o_s e m s e g_b a s e

j o b

from

A p p l i c a t i o n b l o c k

to train the networks with the stated configuration. This is a facile configuration as we can change the size of the images we use and add or remove augmentation only from the user-interface block.

Another interesting aspect that we can see from Figure 14 is the fact that, for this phase of the pipeline, we use all the blocks from the framework. This configuration has two benefits: permits the user to have a more detailed control over the activities from this phase and helps us exemplify the interconnections that can be done. Another way to execute this phase is to incorporate the augmentation and validation inside the

A I b l o c k

j o b

using methods provided by libraries (Tensorflow, PyTorch).

In Figure 15, Figure 16, Figure 17 and Figure 18, we present the training results of the models. The corresponding plots are automatically exported by EECVF when doing any training.

In Figure 19, we present the results of the evaluation of the models using the IoU metric. Evaluation of the models is done once inside

A I b l o c k

in the training mechanism, see Figure 15, Figure 16, Figure 17 and Figure 18, and the second evaluation using

A p p l i c a t i o n b l o c k

. Using the

r u n_I o U_

b e n c h m a r k

j o b

from

B e n c h m a r k b l o c k

, we configure the evaluation of IoU, see Figure 14, on the data. The data are processed by the

A p p l i c a t i o n b l o c k

using

d o_s e m s e g_b a s e_j o b

,

j o b

that takes the data from the

A I_b l o c k

.

We configured the

A p p l i c a t i o n b l o c k

to handle the augmentation, the

A I b l o c k

to handle the training and the

B e n c h m a r k b l o c k

to handle the evaluation of the models. As we can observe in Figure 19, the framework automatically outputs the average time for every phase,

w a v e

or

j o b

and also the memory address of each created

p o r t

.

As we stated in the beginning of the section, we will not further fine-tune the networks as it is not in our scope, even if this would be the case for a normal application.

The setup code corresponding to the application experiment is presented in Figure 20. As input, the system uses the test subset of TMBuD dataset that consists of 35 images of buildings. When adding the smoothing

j o b s

to the pipelines we took in consideration several variations of configuration parameters. This causes the pipeline to create divergent paths that result in several new outputted edge maps at the end.

In Figure 20, one can see how we configured the CV pipeline. To make the user interaction easier with the framework,

j o b s

return the output

p o r t

names that they create. This is an important feature because we use the

p o r t

names to link the flow of

j o b s

.

Like stated in the introduction of the example, we smooth the input image with several variants of the Bilateral filter [54] and Anisotropic Diffusion filter [55]. This is done by re-triggering the

j o b

in the user–interface with different parameters. One important aspect when doing this is to change the name of the output

p o r t

. Not changing the name will cause the

j o b

to be discarded as it will not bring added value to the final pipeline.

All the offered

j o b s

from the framework have default output

p o r t

names. This feature is added so that users do not need to add a specific name to the output

p o r t

. As we can see in Figure 20, most of the

j o b s

that we use do not configure the output name parameter and rely on the default names.

When configuring our edge detection

j o b s

,

d o_c a n n y_o t s u_m e d i a n_s i g m a_j o b

for Canny [51] and

d o_e d g e_d r a w i n g_m o d_j o b

for the ED [53] are using the Orhei operator [60] that is dilated [61] with a factor of 2. For the Shen–Castan [52] operator, we are using the standard binary Laplace operator.

We observe from Figure 20 that the output

p o r t s

from the edge detection

j o b s

are being saved in a list. This list will be used later by us to specify to the

c o n f i g u r e_s a v e_p i c t u r e s

service which

p o r t s

we want to save. Afterwards, we will use the list to configure what

j o b s

to evaluate using the

B e n c h m a r k i n g b l o c k

. In both cases,

r u n_F O M_b e n c h m a r k

and

r u n_b s d s 500_b o u n d a r y_b e n c h m a r k

, we need to set the following parameters: location of output images, location of ground-truth images, location of original images and a list of

p o r t s

to evaluate.

In Figure 21, we presented a part of the data that EECVF outputs when executed. First thing we can observe is the fact that the pipeline described is trimmed to the slimmest one possible, like stated in Algorithm 1. In this example, we triggered a number of 752

j o b s

that are parsed by the algorithm and in the end only 654 unique

j o b s

with 902

p o r t s

are executed. For example, the

A p p l i c a t i o n b l o c k

handles the duplication of

d o_g r a y s c a l e_t r a n s f o r m_j o b

. This

j o b

is triggered individually by us, but it is part of the edge detection jobs too.

As described in Section 3, the framework will add by itself

j o b s

if they are needed. An example of this, we can see Figure 20 where the user triggered

d o_s h e n_c a s t a n_j o b

but EECVF adds to the pipeline the following jobs, as we can see in Figure 21:

d o_i s e f_f i l t e r_j o b

the specific ISEF filter,

d o_l a p l a c i a n_f r o m_i m g_d i f f_j o b

or other Laplace variant configured by users, followed by

d o_z e r o_c r o s s i n g_a d a p t i v e_w i n d o w_i s e f_j o b

, and

d o_t h r e s h o l d_

h y s t e r e s i s_i s e f_j o b

as the last step of the algorithm. These

j o b s

are not random but they are the inner steps of the Shen–Castan algorithm and we do this for a better optimization of

j o b s

and

p o r t s

. This aspect of

j o b s

being broken up in smaller

j o b s

is detailed in the description of each individual

j o b

.

We can observe in Figure 21 that the framework can handle missing data or corrupted flows. As we mentioned in the beginning of the example, we triggered a branch of the pipeline to run for the second pyramid level without the input image being there. The framework offers the information that the

j o b

was triggered by the user but could not run.

Some images from the processing of the pipeline are presented in Figure 22. This is done in order to demonstrate that the pipeline executed all the steps described in the example presented in Figure 10. The list of saved ports can be easily configured using the

c o n f i g u r e_s a v e_p i c t u r e s

. This

s e r v i c e

will communicate to the

D a t a p r o c e s s i n g b l o c k

how to configure the saving of

p o r t s

. This is an important aspect to configure because the number of

p o r t s

to save will affect the run-time and the memory size that the application will take on the hardware. We could not show all the 96 edge maps that the pipeline exports at the end because of the lack of space in this paper, but the example can be reproduced from the EECVF repository.

For our experiment, we aim to evaluate our resulting edge-maps, using the

B e n c h m a r k

b l o c k

. In Figure 23, we have plotted the best 25 results of PCM evaluation done by the

r u n_b s d s 500_b o u n d a r y_b e n c h m a r k

j o b

. This is done by using the

j o b

from the

D a t a

P r o c c e s s i n g

b l o c k

:

p l o t_f i r s t_c p m_r e s u l t s

. In Figure 24, we have plotted the average run-time; this is a commonly desired information when executing a CV pipeline. Furthermore, we have configured

p l o t_a v g_t i m e_j o b s

to not add any legend, because the pipeline consists of a high number of

j o b s

.

In this section, by using the experimental pipeline, we have demonstrated that the proposed framework, EECVF, is capable of handling, configuring, and evaluating a complex CV pipeline with several paths and a big number of

j o b s

and

p o r t s

. Another important aspect to reiterate is the robust and stable error handling mechanism the framework has incorporated. Using parts of code that were presented in Figure 14 and Figure 20, we desired to give examples on how to configure such a pipeline. Users can execute the

m a i n_e e c v f_j u r n a l . p y

example from the framework repository to reproduce the example.

5. EECVF Used in Education and Research

Teaching image processing techniques to students is often challenging. Image processing requires knowledge on the inner workings of the visual system of humans and on how they process the acquired visual information. It also requires that students understand how the visual effect of image processing relates to the mathematical algorithms used in processing.

In the past years, CV has touched upon all fields of modern life, including teaching and education. One of the common ways in which CV is involved in this field is through Augmented and Virtual Reality technologies [62,63,64,65,66,67]. If this is well understood by the students, they may proceed to the implementation stage of an image processing algorithm using a specific programming language [68].

A comprehensive review of computer vision education [69] points out the following areas in which EECVF might bring specific benefits:

Usage in core courses. Core courses in Computer Science are usually too oriented towards computers themselves and not towards real-world applications. Thus, lab exercises and homework assignments are not engaging enough for students to create positive learning. Introduction of Computer Vision examples and applications in courses such as programming, data structures, algorithms, math, or hardware can make the mostly theoretical knowledge more interesting and practical for students [69]. Basic algorithms can be demonstrated as images processing algorithms, data structures can be taught as inputs or outputs for a Computer Vision pipeline, programming assignments can be formulated so that they generate a processed image at the end, and so on.

Effective and flexible software tool. CV is a vast domain and students need guidance and predictability when tackling a CV assignment. Even a simple system is hard for them to implement in a short period of time if they need to develop everything from scratch [69].

Tool for teachers to design assignments. A source of difficulty in every CV course is the need to cover, in the same semester, both basic methods and algorithms and the latest findings and applications in the field. To do this, teachers need to be able to explain specific tasks in the CV process but also demonstrate real-world applications that ignite and maintain the interest of the students [69].

Usage in research for education. Research is insufficiently exploited and integrated in student education, as a means of enhancing their critical thinking skills, creativity, and ability to work in collaborative projects [69].

Although we have drawn some positive empirical insights from using the EECVF framework with our students, we plan to run a comprehensive research study to determine to what extent EECVF can alleviate the previous obstacles in computer vision education and research through the following features: breaking down a CV application in manageable steps; hiding the inner workings of a CV pipeline when this is not important for the course-specific task; out-of-the-box debugging capabilities.

6. Conclusions

EECVF is a open source Python-based framework that aims to assist researchers and teachers on day-to-day activities in the CV domain. As we presented in Section 3, the EECVF is a complex framework but facile to use. EECVF incorporates all the needed steps for a research activity in the CV domain.

Because of the configuration layer for each block, the user does not need to understand the way the predefined jobs are implemented. Another benefit is that users do not need to concern themselves with the data processing flow because that is embedded in the framework.

In Section 2, we presented a literature review and analysis of existing frameworks, highlighting the strong points of each. As we can observe, in recent years, the Python language has become the preferred programming language for the existing framework solutions. This trend is supported by Python’s capacity to interact with other programming languages and its cross-platform design.

We consider that, through the example we described in Section 4, we managed to present the “one click” quality of our framework. As we saw, users can configure mixed CV pipeline (classic and AI elements) upon the desired information and the results are saved and evaluated. This aspect is important when we consider that, in a modern CV application, data processing has become an essential process.

EECVF is a easy to use, modular and flexible software framework that can handle complicated CV pipelines while offering all the necessary information to the user for understanding the effect of each block in the chain. Combining the robust design of our framework with the advantages of Python programming language has proven to be beneficial to the outcome. EECVF can run in multiple operating systems with minimal changes.

For further work, we intend to focus on improving the inner workings of the framework and offer much more information regarding the resource and job management. We consider for this improvement to use external tools or libraries like SLURM [70].

The EECVF is in our perspective a continuously evolving framework. We believe that new image processing concepts,

j o b s

, and

s e r v i c e s

will be added due to day-to-day research or educational activities. The rate of EECVF development will probably be constant with the evolution of the CV domain.

Author Contributions

Conceptualization, C.O., S.V., M.M., and R.V.; Methodology, C.O., S.V., and M.M.; Software, C.O.; Validation, C.O., S.V., and M.M.; Formal analysis, S.V. and R.V.; Investigation, C.O. and M.M.; Resources, C.O., S.V., and M.M.; Data curation, C.O., S.V., M.M., and R.V.; Writing—original draft preparation, C.O., S.V., M.M., and R.V.; Writing—review and editing, C.O., S.V., M.M., and R.V.; Visualization, C.O.; Supervision, R.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Klette, R. Concise Computer Vision; Springer: London, UK, 2014. [Google Scholar]
Dalarmelina, N.d.V.; Teixeira, M.A.; Meneguette, R.I. A real-time automatic plate recognition system based on optical character recognition and wireless sensor networks for ITS. Sensors 2020, 20, 55. [Google Scholar] [CrossRef] [Green Version]
Dinges, L.; Al-Hamadi, A.; Elzobi, M.; El-Etriby, S. Synthesis of common Arabic handwritings to aid optical character recognition research. Sensors 2016, 16, 346. [Google Scholar] [CrossRef] [Green Version]
Michalak, H.; Okarma, K. Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition. Sensors 2020, 20, 2914. [Google Scholar] [CrossRef] [PubMed]
Zhou, Q.; Chen, R.; Huang, B.; Liu, C.; Yu, J.; Yu, X. An automatic surface defect inspection system for automobiles using machine vision methods. Sensors 2019, 19, 644. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, X.; Zhang, J.; Ma, M.; Chen, Z.; Yue, S.; He, T.; Xu, X. A high precision quality inspection system for steel bars based on machine vision. Sensors 2018, 18, 2732. [Google Scholar] [CrossRef] [Green Version]
Dorninger, P.; Pfeifer, N. A comprehensive automated 3D approach for building extraction, reconstruction, and regularization from airborne laser scanning point clouds. Sensors 2008, 8, 7323–7343. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kedzierski, M.; Fryskowska, A. Terrestrial and aerial laser scanning data integration using wavelet analysis for the purpose of 3D building modeling. Sensors 2014, 14, 12070–12092. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Singh, S.P.; Wang, L.; Gupta, S.; Goli, H.; Padmanabhan, P.; Gulyás, B. 3D deep learning on medical images: A review. Sensors 2020, 20, 5097. [Google Scholar] [CrossRef] [PubMed]
Singh, S.P.; Wang, L.; Gupta, S.; Gulyás, B.; Padmanabhan, P. Shallow 3D CNN for detecting acute brain hemorrhage from medical imaging sensors. IEEE Sens. J. 2020. [Google Scholar] [CrossRef]
Kocić, J.; Jovičić, N.; Drndarević, V. An end-to-end deep neural network for autonomous driving designed for embedded automotive platforms. Sensors 2019, 19, 2064. [Google Scholar] [CrossRef] [Green Version]
Baba, M.; Gui, V.; Cernazanu, C.; Pescaru, D. A sensor network approach for violence detection in smart cities using deep learning. Sensors 2019, 19, 1676. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tang, K.; Liu, A.; Wang, W.; Li, P.; Chen, X. A novel fingerprint sensing technology based on electrostatic imaging. Sensors 2018, 18, 3050. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Simion, G.; Gui, V.; Otesteanu, M. Finger detection based on hand contour and colour information. In Proceedings of the 2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 19–21 May 2011; pp. 97–100. [Google Scholar]
Mirsu, R.; Simion, G.; Caleanu, C.D.; Pop-Calimanu, I.M. A PointNet-Based Solution for 3D Hand Gesture Recognition. Sensors 2020, 20, 3226. [Google Scholar] [CrossRef] [PubMed]
Zengeler, N.; Kopinski, T.; Handmann, U. Hand gesture recognition in automotive human–machine interaction using depth cameras. Sensors 2019, 19, 59. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Szeliski, R. Computer Vision: Algorithms and Applications; Springer: London, UK, 2010. [Google Scholar]
Krig, S. Vision pipelines and optimizations. In Computer Vision Metrics; Springer: Cham, Switzerland, 2016; pp. 273–317. [Google Scholar]
Orhei, C.; Mocofan, M.; Vert, S.; Vasiu, R. End-to-End Computer Vision Framework. In Proceedings of the 2020 International Symposium on Electronics and Telecommunications (ISETC), Timisoara, Romania, 5–6 November 2020; pp. 1–4. [Google Scholar] [CrossRef]
Buckler, M.; Jayasuriya, S.; Sampson, A. Reconfiguring the imaging pipeline for computer vision. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 975–984. [Google Scholar]
Patel, P.; Thakkar, A. The upsurge of deep learning for computer vision applications. Int. J. Electr. Comput. Eng. 2020, 10, 538. [Google Scholar] [CrossRef]
Perrault, R.; Shoham, Y.; Brynjolfsson, E.; Clark, J.; Etchemendy, J.; Grosz, B.; Lyons, T.; Manyika, J.; Mishra, S.; Niebles, J.C. The AI Index 2019 Annual Report; AI Index Steering Committee, Human-Centered AI Institute, Stanford University: Stanford, CA, USA, 2019. [Google Scholar]
End-to-End CV Framework (EECVF). Available online: https://github.com/CipiOrhei/eecvf (accessed on 15 April 2021).
Thompson, C.; Shure, L. Image Processing Toolbox: For Use with MATLAB; [User’s Guide]; MathWorks: Natick, MA, USA, 1995. [Google Scholar]
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 25, 120–125. [Google Scholar]
Tschumperlé, D. The cimg library. In Proceedings of the IPOL 2012 Meeting on Image Processing Libraries, Cachan, France, 27 June 2012. [Google Scholar]
Kovesi, P.D. MATLAB and Octave Functions for Computer Vision and Image Processing; Centre for Exploration Targeting, School of Earth and Environment, The University of Western Australia: Perth, Australia, 2000; Volume 147, p. 230. Available online: http://www.csse.uwa.edu.au/~pk/research/matlabfns (accessed on 22 April 2021).
Chen, M.; Radford, A.; Child, R.; Wu, J.; Jun, H.; Dhariwal, P.; Luan, D.; Sutskever, I. Generative Pretraining from Pixels. In Proceedings of the 37th International Conference on Machine Learning, 13–18 July 2020. [Google Scholar]
Holmes, G.; Donkin, A.; Witten, I.H. WEKA: A machine learning workbench. In Proceedings of the ANZIIS ’94—Australian New Zealnd Intelligent Information Systems Conference, Brisbane, Australia, 29 November–2 December 1994; pp. 357–361. [Google Scholar] [CrossRef] [Green Version]
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 2008, 11, 10–18. [Google Scholar] [CrossRef]
Williams, G.J. Rattle: A data mining GUI for R. R J. 2009, 1, 45–55. [Google Scholar] [CrossRef] [Green Version]
Williams, G. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery; Springer: London, UK, 2011. [Google Scholar]
Gould, S. DARWIN: A Framework for Machine Learning and Computer Vision Research and Development. J. Mach. Learn. Res. 2012, 13, 3533–3537. [Google Scholar]
Schindelin, J.; Arganda-Carreras, I.; Frise, E.; Kaynig, V.; Longair, M.; Pietzsch, T.; Preibisch, S.; Rueden, C.; Saalfeld, S.; Schmid, B.; et al. Fiji: An open-source platform for biological-image analysis. Nat. Methods 2012, 9, 676–682. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kiefer, P.; Schmitt, U.; Vorholt, J.A. eMZed: An open source framework in Python for rapid and interactive development of LC/MS data analysis workflows. Bioinformatics 2013, 29, 963–964. [Google Scholar] [CrossRef]
Radlak, K.; Frackiewicz, M.; Szczepanski, M.; Kawulok, M.; Czardybon, M. Adaptive Vision Studio—Educational tool for image processing learning. In Proceedings of the 2015 IEEE Frontiers in Education Conference (FIE), El Paso, TX, USA, 21–24 October 2015; pp. 1–8. [Google Scholar]
Wang, D.; Foran, D.J.; Qi, X.; Parashar, M. HetroCV: Auto-tuning Framework and Runtime for Image Processing and Computer Vision Applications on Heterogeneous Platform. In Proceedings of the 2015 44th International Conference on Parallel Processing Workshops, Beijing, China, 1–4 September 2015; pp. 119–128. [Google Scholar]
Alberti, M.; Pondenkandath, V.; Würsch, M.; Ingold, R.; Liwicki, M. DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments. In Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA, 5–8 August 2018; pp. 423–428. [Google Scholar] [CrossRef] [Green Version]
Tokui, S.; Okuta, R.; Akiba, T.; Niitani, Y.; Ogawa, T.; Saito, S.; Suzuki, S.; Uenishi, K.; Vogel, B.; Yamazaki Vincent, H. Chainer: A deep learning framework for accelerating the research cycle. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2002–2011. [Google Scholar]
Marchesi, M.; Succi, G.; Wells, D.; Williams, L.; Wells, J.D. Extreme Programming Perspectives; Addison-Wesley: Boston, MA, USA, 2003; Volume 176. [Google Scholar]
Millman, K.J.; Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 2011, 13, 9–12. [Google Scholar] [CrossRef] [Green Version]
Freeman, E.; Freeman, E.; Bates, B.; Sierra, K. Head First Design Patterns; O’ Reilly & Associates, Inc.: Sebastopol, CA, USA, 2004. [Google Scholar]
Fröhlich, B.; Rodner, E.; Denzler, J. A Fast Approach for Pixelwise Labeling of Facade Images. In Proceedings of the International Conference on Pattern Recognition (ICPR 2010), Istanbul, Turkey, 23–26 August 2010. [Google Scholar] [CrossRef] [Green Version]
Brust, C.A.; Sickert, S.; Simon, M.; Rodner, E.; Denzler, J. Efficient Convolutional Patch Networks for Scene Understanding. In Proceedings of the CVPR Workshop on Scene Understanding (CVPR-WS), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
TiMisoara Building Dataset Timisoara. Available online: https://github.com/CipiOrhei/TMBuD (accessed on 12 March 2021).
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Zhang, X.; Zou, J.; He, K.; Sun, J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 1943–1955. [Google Scholar] [CrossRef] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Xu, Q.; Chakrabarti, C.; Karam, L.J. A distributed Canny edge detector and its implementation on FPGA. In Proceedings of the 2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), Sedona, AZ, USA, 4–7 January 2011; pp. 500–505. [Google Scholar]
Shen, J.; Castan, S. An optimal linear operator for step edge detection. Graph. Model. Image Process. 1992, 54, 112–133. [Google Scholar] [CrossRef]
Topal, C.; Akinlar, C. Edge drawing: A combined real-time edge and segment detector. J. Vis. Commun. Image Represent. 2012, 23, 862–872. [Google Scholar] [CrossRef]
Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the Sixth international conference on computer vision (IEEE Cat. No. 98CH36271), Bombay, India, 7 January 1998; pp. 839–846. [Google Scholar]
Perona, P.; Malik, J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 629–639. [Google Scholar] [CrossRef] [Green Version]
Abdou, I.E.; Pratt, W.K. Quantitative design and evaluation of enhancement/thresholding edge detectors. Proc. IEEE 1979, 67, 753–763. [Google Scholar] [CrossRef]
Prieto, M.; Allen, A. A similarity metric for edge images. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1265–1273. [Google Scholar] [CrossRef]
Adelson, E.H.; Anderson, C.H.; Bergen, J.R.; Burt, P.J.; Ogden, J.M. Pyramid methods in image processing. RCA Eng. 1984, 29, 33–41. [Google Scholar]
Orhei, C.; Bogdan, V.; Bonchiş, C. Edge map response of dilated and reconstructed classical filters. In Proceedings of the 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, 1–4 September 2020; pp. 187–194. [Google Scholar] [CrossRef]
Orhei, C.; Vert, S.; Vasiu, R. A Novel Edge Detection Operator for Identifying Buildings in Augmented Reality Applications. In International Conference on Information and Software Technologies; Springer: Cham, Switzerland, 2020; pp. 208–219. [Google Scholar]
Bogdan, V.; Bonchis, C.; Orhei, C. Custom Dilated Edge Detection Filters. In Proceedings of the 28th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, WSCG 2020, Václav Skala—UNION Agency, Pilsen, Czech Republic, 18–22 May 2020. [Google Scholar]
Vert, S.; Vasiu, R. Integrating linked data in mobile augmented reality applications. In International Conference on Information and Software Technologies; Springer: Cham, Switzerland, 2014; pp. 324–333. [Google Scholar]
Vasiu, R.; Andone, D. Ideas and Concepts of ViCaDiS–A Virtual Learning Environment for Digital Students. In Multiple Perspectives on Problem Solving and Learning in the Digital Age; Springer: New York, NY, USA, 2011; pp. 359–376. [Google Scholar]
Andone, D.; Ternauciuc, A.; Vasiu, R. Using Open Education Tools for a Higher Education Virtual Campus. In Proceedings of the 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT), Timisoara, Romania, 3–7 July 2017; pp. 26–30. [Google Scholar]
Andone, D.; Vert, S.; Frydenberg, M.; Vasiu, R. Open Virtual Reality Project to Improve Students’ Skills. In Proceedings of the 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT), Mumbai, India, 9–13 July 2018; pp. 6–10. [Google Scholar]
Vert, S.; Vasiu, R. School of the future: Using augmented reality for contextual information and navigation in academic buildings. In Proceedings of the 2012 IEEE 12th International Conference on Advanced Learning Technologies, Rome, Italy, 4–6 July 2012; pp. 728–729. [Google Scholar]
Vert, S.; Andone, D. Zero-programming augmented reality authoring tools for educators: Status and recommendations. In Proceedings of the 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT), Timisoara, Romania, 3–7 July 2017; pp. 496–498. [Google Scholar]
Mocofan, M.; Petan, S.; Vasiu, R. Educational framework model for image processing and image databases. In Proceedings of the International Conference on Energy, Environment, Economics, Devices, Systems, Communications, Computers, IAASAT, Iasi, Romania, 1–3 July 2011; Volume II, pp. 143–147. [Google Scholar]
Bebis, G.; Egbert, D.; Shah, M. Review of computer vision education. IEEE Trans. Educ. 2003, 46, 2–21. [Google Scholar] [CrossRef]
Yoo, A.B.; Jette, M.A.; Grondona, M. Slurm: Simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing; Springer: Berlin/Heidelberg, Germany, 2003; pp. 44–60. [Google Scholar]

Figure 1. Evolution of computer vision [17].

Figure 2. Number of AI papers on arXiv, 2010–2019 [22].

Figure 3. Framework of a modern CV pipeline.

Figure 4. EECVF blocks.

Figure 5.

J o b

structure overview.

Figure 5.

J o b

structure overview.

Figure 6. AI block.

Figure 7. EECVF application block overview.

Figure 8. EECVF application block sequence diagram.

Figure 9. EECVF benchmark block overview.

Figure 10. EECVF example logical scheme.

Figure 11. (a) Original LabelMe image resized; (b) LabelMe original labels; (c) LabelMe corelated labels; (d) TMBuD original image; (e) TMBuD label; (f) TMBuD edge ground truth.

Figure 12. Code snipe of data preparation for training.

Figure 13. Augmentation done for training.

Figure 14. Code snipe of training semantic segmentation models.

Figure 15. VGG-Unet Acc.

Figure 16. VGG-Unet Loss.

Figure 17. ResNet-SegNet Acc.

Figure 18. ResNet-SegNet Loss.

Figure 19. Console output of training evaluation and IoU results.

Figure 20. Setup in EECVF of the example.

Figure 21. Console output of the application.

Figure 22. (a) Raw image; (b) Bilateral smoothing results; (c) Anisotropic smoothing results; (d) Gray transform of b; (e) Gray transform of c; (f) ResNet_SegNet result of a; (g) VGG_Unet result of a; (h) Intersection of f with c; (i)

G_{x}

kernel of h. (j)

G_{y}

kernel of h. (k) Otsu result of h; (l) Canny result of h; (m) ED result of h; (n) ISEF result of h; (o) Binary Laplace of h; (p) Zero-Crosing of o; (q) Shen-Castan of h; (r) Expanded Canny from L1; (s) Expanded ED from L1; (t) Expanded Shen-Castan from L1.

Figure 22. (a) Raw image; (b) Bilateral smoothing results; (c) Anisotropic smoothing results; (d) Gray transform of b; (e) Gray transform of c; (f) ResNet_SegNet result of a; (g) VGG_Unet result of a; (h) Intersection of f with c; (i)

G_{x}

kernel of h. (j)

G_{y}

kernel of h. (k) Otsu result of h; (l) Canny result of h; (m) ED result of h; (n) ISEF result of h; (o) Binary Laplace of h; (p) Zero-Crosing of o; (q) Shen-Castan of h; (r) Expanded Canny from L1; (s) Expanded ED from L1; (t) Expanded Shen-Castan from L1.

Figure 23. ROC plot of best 25 best PCM results.

Figure 24. Average runtime of jobs.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Orhei, C.; Vert, S.; Mocofan, M.; Vasiu, R. End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education. Sensors 2021, 21, 3691. https://doi.org/10.3390/s21113691

AMA Style

Orhei C, Vert S, Mocofan M, Vasiu R. End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education. Sensors. 2021; 21(11):3691. https://doi.org/10.3390/s21113691

Chicago/Turabian Style

Orhei, Ciprian, Silviu Vert, Muguras Mocofan, and Radu Vasiu. 2021. "End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education" Sensors 21, no. 11: 3691. https://doi.org/10.3390/s21113691

APA Style

Orhei, C., Vert, S., Mocofan, M., & Vasiu, R. (2021). End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education. Sensors, 21(11), 3691. https://doi.org/10.3390/s21113691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education^†

Abstract

1. Introduction

2. Related Work

3. Proposed CV System

3.1. High-Level View

3.2. AI Block

3.3. Application Block

3.4. Benchmark Block

3.5. Data Processing

3.6. Job Adding

4. Example Use Case

5. EECVF Used in Education and Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education †

Abstract

1. Introduction

2. Related Work

3. Proposed CV System

3.1. High-Level View

3.2. AI Block

3.3. Application Block

3.4. Benchmark Block

3.5. Data Processing

3.6. Job Adding

4. Example Use Case

5. EECVF Used in Education and Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education^†