-- Main.bboom - 14 Mar 2011

1 Introduction

The component interface and integration plan is one of the deliverables of the Fish4Knowledge project. The purpose of this deliverable is to describe how the individual components of the partners have to cooperate. This plan give the grand design of the entire system and the role of the individual components in this system. For the individual components, we have defined the purpose, input, output, possible method of evaluation and possible failure. The development of the individual component will be entirely the responsibility of the different partners. Another the tasks of the partners is to maintain the descriptions of their components using this wiki. Issues that are important for the integration of the different components are also described in this plan. The plan also contain milestones which indicate when important steps in the development of the entire system should be finished.

Component_Interface_Integration_Plan_v0_4.pdf: Component Interface and Integration Plan (will be discussed during Taiwan meeting 04-2011)

2 Individual components

2.1 Fish Detection Component

2.1.1 Purpose:

The purpose of this component is to detect the fish in the video stream. The detection will
basically locate the fish in each frame. After locating the fish, the contour of the fish will
be saved in the databases. The Detection Component can also save the background image,
which can be used for compression and fast scene recovery. Another task of this component
is to describe the scene. For examples, it will detect if it is dark or light, how much pollution
(green,dirt) is in the water. It can even be able to detect dirtiness on the camera lens. Techniques
like deconvolution might be able to correct for dirt on the lens given older recordings of the
scene where the lens is still clean.

2.1.2 Input: (Videos)

Video streams (necessary)

2.1.3 Output: (Fish Location)

Fish (necessary)
Fish location for each frame (x,y,binary mask or contour,date,time) (necessary)
Scene information (suggested)
Camera information (suggested)
Camera correction for dirtiness (for deconvolution purposes) (minor)

2.1.4 Evaluation:

The fish detection can be evaluated using a labelled dataset of fishes, where we can determine
the false positive and negative rates and make a ROC given certain thresholds. It can be useful
to also determine if there are difference in the ROC curves of different species or dates/times.

2.1.5 Possible Failures:

False positives: Fish is detected where no fish is present. This results in strange recognition
results or a outlier in recognition. Fish recognition can detect parts of the false positive and
throw them away, but this is not the task of the fish recognition component.
False negatives: Fish is not detected while it is present. This might give inaccurate result
in statistics especially if the detection rating are different for certain species, but might be
correctable with ground truth information on smaller subsets.
Incorrect scene information: Incorrect scene information can make it more difficult for the
workflow component to activate the correct components for the further fish detection and recognition.
For instance, a green correction filter has to be applied if the scene information was
correct. This will result in inaccurate colour descriptions for the fish, making the recognition
more difficult.

2.2 Fish Tracking Component

2.2.1 Purpose:

The fish tracking will follow the fish in the video, labelling at which position and in which
direction the fish is going. It also provides information on how long the fish was visible in the
image. More valuable output can be the interaction of the fish in relation with other fishes in
the video, like analysis if a fish is pursued by another fish. Other events that can be detected
are eating, resting, hiding, fighting, mating, schooling, panic. Another possibility is to make
clusters of the behaviour patterns of fish to see if new behaviours can be described. (Notice that
it is very likely Fish Detection/Tracking components are combined for performance reasons).

2.2.2 Input: (Videos)

Video streams (necessary)

2.2.3 Output: (Fish Location)

Fish path (necessary)
Interaction of fish (feature)
Clusters of fish behaviour (feature)

2.2.4 Evaluation:

The fish tracking can be evaluated using labelled data and whether or not it is able to keep track
of the fish. Fish behaviour labelling has to be performed by biologists in order to both learn and
evaluate this. It might also be interesting to look and visualise clusters for behaviour analysis.

2.2.5 Possible Failures:

Path tracking failures: Fish is not followed correctly, usually caused by fish overlapping in
images. This can result in for instance first following a clownfish and afterwards following a
shark. Depending if the fish recognition methods use only the best frame or multiple frame
different problems can be expected: In case of the best frame, you get a false negative. In case
of multiple frame, strange fish descriptions can appear which will result as an outlier in the final
fish recognition.
Fish interaction is incorrect: This gives the users an incorrect result when there is a query for
certain behaviour or when the users request statistics.

2.3 Fish Description Component

2.3.1 Purpose:

This component describes the fish found after the detection stage. In order to describe a fish,
multiple features are selected. This can be features that can be understood by the users, like
the number of fins, the kind of tail, the colour. Probably, there will also be features that are
important from a computer vision point, but these features might not be clear to users. Examples
are Gabor Filters which are able to measure texture in fish. Because the fish can be described
in several ways, it will be important to cooperate with marine biologists to understand which
descriptions are important for them.

2.3.2 Input: (Fish location)

Fish (necessary)
Fish location for each frame (x,y,binary mask,date,time)(necessary)
All frames containing the fish (necessary)
List of features/descriptions (suggested)

2.3.3 Output: (Fish description)

For each fish appearance, a vector of description values (necessary)
For each fish appearance, the descriptions suggested by marine biologists (suggested)
For each fish description suggested by marine biologists, a certainty value (suggested)

2.3.4 Evaluation:

The fish description is usually evaluated together with the recognition. Given that we know that
certain species have certain descriptions, we can also evaluate these descriptions. In order to do
this evaluation labelled data is necessary.

2.3.5 Possible Failures:

Failure in description values: It is possible to have incorrect values in the description value,
usually computer vision uses a large set of description values to become robust against a single
failure for recognition and clustering. Certain failures have an effect on both recognition and
clustering, so searching for robust features or higher level features in those cases is necessary.
Failure in descriptions suggested by marine biologists: These features have to be even more
robust than the description values, because they can be directly used for querying. With these
descriptions, a certainty value can be added to indicate how good the description is observed. If
marine biologists use multiple filter operations to specify their queries, this can help overcome
failure in a single description.
Incorrect segmentation/registration: It is possible to find a tail where the head is. This can
also result in outliers in the vector of description values and incorrect species recognition and
clustering.

2.4 Fish Recognition Component

2.4.1 Purpose:

The Fish Recognition Component will recognise the species or family to which the fish belongs.
Because the fish is visible in multiple frames, these frames might have to be combined to obtain
more information about the fish. We also have to select the frame (time and place in video) that
contains the best appearance of the fish. This can be done using the contour, but also based on
the number of features found by the fish description. If the fish is too far from the camera, it
is possible that we can only determine the family to which the fish belongs and not the precise
species. In computer vision, recognition of objects is a difficult task with a lot of uncertainties.
These uncertainties can be expressed in probabilities or percentage, allows us to communicate
a certainty value that we correctly recognised the fish. Another feature of this component can
be that marine biologists can make their own fish model, based on pre-labelled images or high
level features to search for certain specific fish.

2.4.2 Input: (Fish location)

Fish (necessary)
Fish location for each frame (x,y,binary mask,date,time)(necessary)
All frames containing the fish (necessary)
Fish models (suggested)
Specific Fish models/filters (feature)
For each fish appearance, a vector of description values (necessary)
For each fish appearance, the descriptions suggested by marine biologists (suggested)
For each fish description suggested by marine biologists, a certainty value (suggested)

2.4.3 Output: (Fish labels)

For each fish a genus,family or species name (necessary)
For each fish a certainty score that family/species is correct (suggested)
For each fish the best appearance (suggested)
For each fish a score how good appearance is (suggested)
Output of Specific Fish models/filters (suggested)

2.4.4 Evaluation:

Fish recognition can be evaluated based on labelled sets of fish. We can determine if the label
is correctly found which allows us to show ranking plots. We can also determine a similarity
score given that a certain class is correctly found, which basically allows us to compute the false
positive and negative rate. We are able to compute ROC curves for these similarity scores. It can
also be interesting to look at the recognition rates of the different species, some species can be
easier to recognise as others. Notice that a ground truth set of labelled fishes is difficult to obtain,
because a marine biologist has to label fishes which costs a lot of time. Tools for labelling which
perform perdiction about the labelling can make this task more easy for biologists.

2.4.5 Possible Failures:

Incorrect species: It is possible to label the species incorrectly, this can result in inaccurate
statistical information, a certainty score of the label can already correct for this. Furthermore,
some measurements on the appearance of the fish (resolution, blur, etc) can indicate how
difficult the fish recognition was. It is also possible to run statistics on only high quality fish in
the database.
Incorrect best appearance: If it finds a second best appearance, this will be not a big problem.
In worse cases, however, it can influence the fish recognition results and more important the
user interface. For instance, if user would like to view the fish, this user can not be bothered
with bad quality images of the fish.
Mistake by running Specific Fish models/filters: This gives outliers in the queries of the marine
biologists. Flexible training procedure where users can select the outliers and build new filters
can be a solution which has to be considered.

2.5 Fish Clustering Component

2.5.1 Purpose:

This component allows us to make clusters of fish that are very similar to each other. By
making these clusters, we also determine the fish that are not inside these clusters (outliers).
These outliers can be very interesting for marine biologists, because this methodology enables
us to recognise fish that are unknown to us. Another possibility is to allow marine biologists to
specify filters to search for certain interesting properties certain fish might have. This can allow
marine biologists to determine how certain fish species evolve, by looking at different variations
of for instance tails of a single fish species.

2.5.2 Input: (Fish location)

Fish (necessary)
Fish location for each frame (x,y,binary mask,date,time)(necessary)
All frames containing the fish (necessary)
Fish models (suggested)
Specific Fish models/filters (feature)
For each fish appearance, a vector of description values (necessary)
For each fish appearance, the descriptions suggested by marine biologists (suggested)
For each fish description suggested by marine biologists, a certainty value (suggested)

2.5.3 Output: (Fish clusters)

Similar Fish (feature)
Cluster name (feature)
Interesting Outliers, fish that are different (feature)
Graphic representation of clusters (suggested)

2.5.4 Evaluation:

The fish clustering can also be evaluated based on labelled sets of fishes, however it can also be
a good idea to look here at other evaluation measures. One of the ideas would be to cooperate
feedback in the user interface on how useful the fish clustering is. Another idea is to look at
measurements for semi-supervised learning. The visualisation of cluster already gives a good
indication if these methods work, labelling the correct and incorrect neighbors allows us to
evaluate the clustering methods from a human prospective.

2.5.5 Possible Failures:

Clusters are not meaningful: We anticipate that the clustering puts fish of the same species/family
together in one cluster. It can however be that the clustering results are not logical from the
human perspective. In this case, different clustering techniques or different description have to
be used to improve this.
Clusters are too large: The cluster might be too large so that it will be hard to visualise them. It
can also have the effect that the same species will have multiple clusters.
Too many outliers: The number of outliers is very large, which does not allow marine biologists
to look at the really interesting data because they probably only see the noisy data. Solution can
be found by using only description with large certainties so that more noisy data can be ignored.

2.6 Query Engine

2.6.1 Purpose:

The Query Engine allows searching through all the information stored in the storage facilities.
The Query Engine is closely related with the RDF/XML/SQL datastore definition, which
defines the manner in which they expect that the information will be stored (probably in cooperation
with the information provider). After storing the data properly, the query engine should
be able to retrieve this information and convert it to a useful format for the user interface and/or
users. The challenge of the query interface is to deal with the large amount of data in the storage
facilities. Other challenges arise from the fact that not all data is trustworthy. The query engine
also has to deal with the fact that certain queries are not possible due to limitations in computer
resources.

2.6.2 Input: (Query Answer/Information Requests)

Database information (necessary)
Request for information (necessary)
Meta information (suggested)

2.6.3 Output: (Query information/Representable information)

Query information (necessary)
Representable information (necessary)
Link to other related sites (suggested)

2.6.4 Evaluation:

The query engine should probably be evaluated based on both computation time and usability.
The standard questions of the biologist can be used to test the computation time, while feedback
of users can help to evaluate the usability. Another interesting idea is to obtain statistical
information on the user behaviour and use this to improve and evaluate the both the query
answering and the user interface.

2.6.5 Possible Failures:

Generate time consuming queries: The amount of data in the database makes it difficult run
complicated queries, because they can require a lot of computational resources. The interface
can build in limitations so that users can on make the system unavailable for the rest of the
world. Of course, we should also search at solution on the database component side.
Generate queries with too much output records: The database might return millions of records
on the query, so user interface should be able to deal with that. Automatic filter operation based
on previous queries can help users deal with all the data.
Generate queries with mistakes: The users are usually not aware that the system can also
make mistakes, only by showing the image based results will they know that there are outliers.
Filtering and annotation of outliers can help to remove the largest number of outliers, still 100%
in fish detection/recognition is not expected and should be communicated to the users.

2.7 User Interface Component

2.7.1 Purpose:

The User Interface Component will allow the user to search for information and will then
represent this information to the user. The user interface is connected to the query engine, which
will retrieve the information for the users. The information provide by the query interface can
be linked to other related project, for instance Taiwan fish database, fishbase.org and Catalogue
of Life. The first purpose of the website is to provide an interface to the experts and other
visitors to search in a relative easy manner through the enormous amount of data. A special
area can be developed for specialists (like marine biologists), so that they can login and that
their searches will be remembered. Here, they can also ask for specific features they want to
add to the website or special request to add extra information in the storage facilities.
Of course, the website is also the portal of the project, making it an interesting medium for
communication. In this case, we can think about explaining also the underlining techniques,
education in both computer science and biology, setting up a fish label community or processing
other underwater cameras.

2.7.2 Input: (User input/Representable information)

User input (necessary)
Representable information from database (necessary)

2.7.3 Output: (Interface/User requests)

Web interface (necessary)
Search functions (necessary)
Request for information (necessary)
Special request interface (suggested)
Visualise fish clusters (feature)
Obtain extra information from related sites (suggested)

2.7.4 Evaluation:

The User Interface can be judged on usability and aesthetics. For both, feedback of users is
probably essential. A good evaluation of the interface in the beginning of the project is to
evaluate if the interface is able to answer the questions of the marine biologists.

2.7.5 Possible Failures:

Unclear/complicated interface: The user/biologist cannot find certain description, results from
the website are unclear to the users. To overcome this problem user feedback is necessary, so
next to feedback we already have from our experts, we can add feedback about the website.
Unknown site for marine biologists: The website is not known to marine biologists other than
the people participating in the project, website can be found easily by to all people interested
in this subject, linking with other web resource helps and correct content on main site is can
attract more people.
Broken links with related sites: Links to other website, for instance background information
about a certain species does not work any more. Making a cache of the other website can solve
this problem. Allowing users (biologists) to maintain the website is also an interesting idea.

2.8 Work-flow Component

2.8.1 Purpose:

The purpose of the Work-flow component is to organize the work that has to be done. Because
the different components have different requirements on both CPU, memory and hard-disk
consumption, this can be a difficult task. The Work-flow component looks in the database
to identify what information can be processed and will execute the appropriate components.
The workflow component has to give priorities to certain processes, like the user interface and
query engine if they are used. The assignment of the priorities has to be done from a user
perspective, this means that he probably always want the user interface to be available, but it is
also important to perform the computer vision tasks if the user is not querying information. The
Work-flow component can also get a special request from the User Interface and it will create
a processing chain to generate the requested information. The Work-flow component however
needs detailed information of the other components, like version, purpose, average memory
usage, average CPU demands, average I/O demands, average run-time. This information has
to be contributed when adding a new component to the system, so the Work-flow component
can handle different kinds of information requests and will schedule the correct components for
this.

2.8.2 Input: (Special request + Query work)

Component Information (necessary)
Added Videos (necessary)
Added Fishes (necessary)
Special requests from User Interface (suggested)

2.8.3 Output: (Execute work)

Run components (necessary)
Arguments of component (add links to the information that needs to be processed) (necessary)

2.8.4 Evaluation:

The workflow component can be evaluate on the scheduling schemes it produces and the overall
effect this has on the amount of work processed. We can compare different schemes by handling
the exact same amount of data and compare the schema that performed this task in the most
efficient way. For the evaluation, we can also look at the processor utilisation.

2.8.5 Possible Failures:

Fish Detection/Recognition Component fails: In this case, no results or incomplete result are
stored in the database. This should however be detected by the workflow component, making
sure that the developers can check their code and post a newer version of the component
Not enough resources: There are not enough resources to keep the system working. For
instance, it is not possible to run both fish detection and recognition components to process
all the videos with fish and still keep the user interface working. In this case, alerts should
be generated. These alerts should contain resources that are the bottleneck. In this case,
developers can find other solution which require less or other resources. We do not have to
be able to process all the data in the beginning with the same rate we acquire it. In the final
version, we aim to reach this goal.
A special request requires too much resources: The amount
of time/resources for a certain special request are too much, in this case the user should be
informed and alternatives can be offered to the user.
Special requests are not supported by components: The user should be informed that the request
is impossible at the moment, while a log can be created so that developers can observe which
special requests are important for the users.

2.9 Database Component

2.9.1 Purpose:

The database component allows the different components to store and query information. There
will be different kind of information, like videos, images, numerical data and strings. The
database will be according to the structure defined by CWI. For both the fish detection and
recognition component, there will be a simple interface to query lists of unprocessed videos and
fishes. The query interface to these components can be simple, while CWI has a more powerful
query interface for the user interface. The database component allows all the other components
to share the processed information with each other, but all components must have their unique
identifier when they store information. This allows us to be able to see which component is
contributing the information.

2.9.2 Input: (Store Information/Query)

Videos (necessary)
Fish Location (necessary)
Fish Tracking (necessary)
Fish Description (necessary)
Fish Recognition (necessary)
Fish Clustering (necessary)
Component which inserted the information (necessary)

2.9.3 Output: (Get Information/Answer Query)

Videos (Frame bases) (necessary)
Fish Location (necessary)
Fish Tracking (necessary)
Fish Description (necessary)
Fish Recognition (necessary)
Fish Clustering (necessary)
Component which inserted the information (necessary)

2.9.4 Evaluation:

The database can be tested by measuring the time it takes to query certain records. This can be
performed together with testing the query engine. The queries define by biologists give a good
starting point for evaluation of the database. Generating random queries can also give some
insight into the system, but that might test the capabilities of the entire system more than the
database component. Finally testing the correctness/data lost of the information in the database
gives insight into the robustness of the chosen database solution.

2.9.5 Failures:

Time consuming queries: Despite that is one of the challenge of the project, in some cases,
there are not enough resources to compute it. In these cases, we can run queries on part of the
database. Detection of these queries is important while we want the system to be available for
other users. Communication of this problem through the user interface, by giving alternative
options can help the user in his awareness.
Storage full: This can be a serious problem, several solution are possible like using more hard
discs or using (better) compression methods.
Hard-disc broken: In the case of hard-disc failures, it is necessary it have a backup at all
time, using distributed solutions can already provide us with a framework which automatically
deals with these kind of failures. Hardware/Server unavailable: The machine that contains
the database might be unavailable, in this case backup servers are needed, some distributed
framework already have solutions in these cases.
Too many read/write operation simultaneously: The framework can not deal with both the
queries and the storage request to the database, in this case the work-flow manager should be
able to stop/decrease the workload. It can be handy to make a profile of the expected workload
and schedule the work done by additional components according to that profile.

Topic attachments
I Attachment Action Size Date Who Comment
pdfpdf Component_Interface_Integration_Plan_v0_4.pdf manage 881.2 K 14 Mar 2011 - 14:08 Main.bboom Component Interface and Integration Plan (will be discussed during Taiwan meeting 04-2011)
Topic revision: r2 - 14 Mar 2011 - 14:08:57 - Main.bboom
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies