Research Projects

* Note: All the texts, images, and videos here are copyrighted and belong to the author. Republishing of this work must be correctly cited and asked for permission. Contact email: omar DOT florez AT aggiemail DOT usu DOT edu

Traffic Understanding in Video Sequences

A sequence that contains three types of activities G1, G2, and G3 in diff erent frames. Only moving objects yields activities based on events, represented as points in each group. We want to fi nd groups of objects performing the same activity, represented as points with different colors. Our goal in this project is to automatically learn an unknown number of types of activities G = {G1, G2, …} such that they can describe scenes in video data. As an example, the following rule can be extracted from the video to answer this question: “Does the current scene correspond to a red light scenario?”.

{#cars_moving_from_up_to_down in [4,6), #cars_turning_right in [1,5)} → {#people_walking_from_right_to_left in [1,4)}

The above rule suggests that a strong relationship exists between cars going straight, some people crossing the street from left to right, and relatively few cars starting to turn right because a red light is active. We introduce a hierarchical Dirichlet Process which trains an infinite number of HMMs (Hidden Markov Models) to recognize types of activities (topics) in the video. When we analyze the maximum support of every temporal transaction, we noticed that the peak values correspond to the most significant scenes in the video in terms of frequency, as shown below.




Very fast retrieval of human motion in video data



Retrieving human motion in video data requires a good description based on their most important features. In this project we want represent an activity performed in video with events which are robust against camera motion, illumination changes, and shadows. Our approach is summarized in the next figure.



An event can be understood as a significant change of intensity around a spatio-temporal neighborhood, as shown below.

However, the way how events are distributed are associated to the pixels with more change of intensity. Hence, we need to find those pixels by performing first order derivatives in terms of space and time. This can provide different results, but in this research we found that the derivative of motion with respect to time can effectively isolate which regions are more important to analyze. We can see this in the next image.

Hence, our task is reduced to find the local maximums in those surface to describe the motion with a set of important points. When we normalize the events with respect to the center of the actor, we obtain a point distribution that forms clusters. The shape and number of clusters define a description of how events are distributed according to certain activity. We discretisize that information into a grid with fixed resolution, but we do not which resolution is the indicated to describe the information correctly.

Our experiments showed that all these resolutions are important. Coarse resolutions detect large variations and fine resolutions detect subtle changes in similar activities. Thus, we form a hiearchical description of those multiple resolutions and describe an activity in multiple layers.

The result is a vector of fixed dimensionality that summarize all the information contained during the activity execution. We use a known 1-NN classifier to quickly retrieve which videos contain the same information (please check the video shown at the top of this page).

Sublinear Similarity Search of Human Motion via Timeseries.


Timeseries are an important representation of the behavior of processes over uniform time intervals. They are widely used in various fields such as robotics, gene expression, electrocardiograms, stock market quotes, and multimedia data. In this work, we focus on an important and interesting special case: the representation of motion, recorded from live actors and described as a timeseries.

The indexing, querying, and classification of motion-related timeseries is an open problem. The above figure illustrates the use of timeseries to represent human motion. The timeseries are generated by sensors placed on the body of an actor. Each sensor measures an aspect of the motion, for instance angular velocity or spatial position, as the data is collected over time it forms a timeseries which is part of the entire actor's motion with respect to time. This process is visualized in the next video.

Motion-related timeseries have features that are not commonly present in traditional types of vector data, which create additional indexing challenges as described in more detail below.

  • High and variable dimensionality (proportional to the length of the actor's movement)
  • No Euclidean distance (since we have patterns of different length)
  • DTW does not define a metric space (DTW is the most common way to represent similarity between timeseries)

Our algorithm generalizes the dot product operator to hash timeseries of variable dimensionality without assuming constant dimensionality or requiring dimensionality normalization, unlike other approaches. By avoiding normalization, our hashing algorithm preserves more timeseries information and improves retrieval accuracy, and by hashing timeseries we can achieve sublinear computation time for most searches.

The hash-based approach exploits the idea that if two vectors are close together in their original space, then after a scalar projection operation which maps each vector to a point on a line, these two vectors will remain close. If we quantize the line by partitioning it into intervals of same width (hash buckets), then we would expect similar vectors to be mapped into the same line interval. The example given below illustrates this idea.

When multiple hash tables are considered in parallel to increase the likelihood of finding the right answer in one of the hash tables (as shown in next figure), we obtain an index that performs well in the querying of real human motion datasets in sublinear computational times.




Extraction of interpretable time series from Video data.


We propose a novel algorithm to extract interpretable time series from video to characterize motion embedded in the video. Our method relies on describing the motion exposed in a video as a collection of spatiotemporal gradients. Each gradient models a unique position in the video representing high variations both in space and time. The variation is measured as the change of one point with respect to its spatiotemporal neighborhood. Rather than obtaining a coarse sampling of the motion by taking one event per frame, we obtain a continuous function by considering all the events that fall in the short-time slicing window of time length equal to the value of the temporal variance. The result is a composed time series that represents the motion in the video independent of rotation and scale. The advantages of our approach are two fold: (1) We avoid tracking specific points through the frames of a video stream, but consider the distribution of general unit motions over time. (2) We do not require a learning process to categorize movements since our method to match similar motions is only based on distances of time series. As an empirical demonstration of the viability of our method, we are able to cluster human motions contained in 114 videos into hand-based motions and foot-based motions with the precision of 86.0% and 75.9% respectively.




GPS applications: Mobile browsing USU campus with cellphones

GPS technology, currently very common in modern cellphones, can provide valuable information to browse some specific regions such as university campus, airports, and downtowns.

In this video, we use position information of buildings and services to visually explore the Utah State University main campus. You may want to compare this application with the existing one at http://www.usu.edu/map/.

A map like this is essential for people that visit the campus for first time like freshmen and international students. This is a joined work of Omar U. Florez with Stephanie Reeder, Jordan Buxton, and Darren Cluff at Utah State University.




Voice2SQL.

phonemeskohonenmap.jpg For many people the use of traditional interfaces (mouse, keyboard) are not suitable, e.g. blind people or people with Parkinson's desease. In that context, the use of voice as input method and natural language as language structure seems to be a more effective interface. Thus, the goal of Voice2SQL is to recover information from databases using spoken natural language. Roughly speaking, a one-directional translation between two languages is achieved: Spoken language to SQL sentences. A SOM neural network is used to model the relation between phonemes present in the voice stream. In the figure you can see how similar phonemes are placed close on the map. Words are then reconstructed as a sequence of phonemes through the time: K-O-H-O-N-E-N.




Scalable Fingerprint Recognition.

img4v.jpg Since large databases of fingerprint images are been available nowadays, searching methods on these databases are requiered. Indexation is actually the most common method to organize objects in databases, but this method is normally restricted to exact comparisons of index objects (e.g., numbers, strings). However, if there is a distance function that can be defined in a set of objects, we can organize them in a Metric Space. In this project we organize fingerprint images based on the distance between their descriptors. The result is a Metric Space which can be pruned by using some heuristics such as triangular inequality, pivots, hypersherical regions, and distance histograms with the main goal of aproximating query results with fast searching times. We peform complex types of queries such as: “Given the query image q, retrieve the 3 most similar images to q in the database” or “Given the query image q, retrieve the images which are at a distance of d”.




Self-healing and fault-tolerance properties in Multi Agent Sytems.

Many works in Intrusion Detection Systems (IDS) use agents –commonly hierarchically organized– to delegate roles in the system. One structural problem in these approaches is how to keep working when parent nodes fail. This situation often implies that nodes under the authority of corresponding parent nodes stop working. This project introduces the communication subgraph and explores the use of features present in agents such as: collaboration, negotiation and collision formation, to get self-healing properties in a Multi Agent Sytem. The term self-healing involves the capability of automatically recovering from failures and mistakes . The side figure shows the communication subgraph between the Network agent (N) and Maintenance agents (M). Each Maintenance agent represents a LAN and extra paths between agents provide fault-tolerance in case the Network agent fails and paths wMxN are not available (click on figure to enlarge).|

 
projects.txt · Last modified: 2010/06/24 18:35 by omarflorez     Back to top
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki Design by Chirripó