畢 業(yè) 設(shè) 計(論 文)外 文 參 考 資 料 及 譯 文
譯文題目: VEHICLE DETECTION AND TRACKING
車輛檢測與跟蹤
學(xué)生姓名:
專 業(yè):
所在學(xué)院:
指導(dǎo)教師:
職 稱:
Chapter 5
Vehicle Detection and Tracking
5.1 Introduction
Statistics shows that about 60% of the rear-end crash accidents can be avoided if the driver has additional warning time. According to the Ministry of Public Safety of P.R. China, there were 567,753 reported road traffic accidents in 2004, among those about 80% of the severe police-reported traffic accidents were vehicle–vehicle crashes. Almost two-fifths of these crashes resulted in an injury, with over 2% of the total crashes resulting in a death. Clearly, vehicle detection is an important research area of intelligent transportation systems [2, 11, 20]. It is being used in, among others, adaptive cruise control (ACC), driver assistance systems, automated visual traffic surveillance (AVTS), and self-guided vehicles. However, robust vehicle detection in real world traffic scenes is challenging.
Currently, IDASW systems based on radars have a higher cost than those based on machine vision, while having narrow field of view and bad lateral resolution. In Adaptive Cruise Control (ACC) systems, a camera can detect the cut-in and over- taking vehicle from the adjacent lane earlier than a radar. Due to these reasons, it is more difficult to apply such radar-based systems into practical IDASW systems. Consequently, robust and real time vehicle detection in video attracts more attention of scholars all over the world [2, 4, 14].
To detect on-road vehicle in time, this chapter introduces a multi-resolution hypothesis-validation structure. Inspired by A. Broggio[2], we extract three ROIs: a near one, one in the middle, and a far one, from a 640 × 480 image. His approach uses fixed regions at the cost of flexibility, we remove this limitation and build a simple and efficient hypothesis-validation structure which consists of the three steps described below:
1.ROI determination: We generate ROI candidates using a vanishing point of the road in the original image.
2.Vehicle hypothesis generation for each ROI using horizontal and vertical edge detection: We create a multi-resolution vehicle hypothesis based on the preceding candidate regions. From the analysis of edge histograms, we generate hypotheses for each ROI and combine them into a single list.
3.Hypothesis validation using Gabor features and SVM classifiers: We conduct vehicle validation using the boosted Gabor features of 9 sub-windows and the SVM classifiers. According to the judging of the classifiers, we determine whether hypotheses represent a vehicle or a non-vehicle.
5.2 Related Work
Hypotheses are generated using some simple features, such as color, horizontal and/or vertical edges, symmetry [2, 5], motion, and stereo visual cue. Zehang Sun proposed a multi-scale hypothesis method in which the original image was down- sampled to 320 × 240, 160 × 120, and 80 × 60. His vehicle hypotheses were generated by combining the horizontal and vertical edges of these three levels, and this multi-scale method greatly reduced random noise. This approach can generate multiple-hypothesis objects, but a near vehicle may prevent a far vehicle from being detected. As a result, the method fails to generate the corresponding hypothesis of the far vehicle, reducing the vehicle detection rate.
B. Leibe et al. seated a video-based 3D dynamic scene analysis system from a moving vehicle [9] which integrated scene geometry estimation, 2D vehicle and pedestrian detection, 3D localization and trajectory estimation. Impressively, this paper presented a multi-view/multi-category object detection approach in a real world traffic scene. Furthermore, 2D vehicle pedestrians detection is converted into 3D observation.
Vehicle symmetry is an important cue in vehicle detection and tracking. Inspired by the voting of Hough Transform, Yue Du et al. proposed a vehicle following approach by finding the symmetry axis of a vehicle [5]; however, their approach has several limitations, such as large computing burden, and it only generates one object hypothesis using the best symmetry. Alberto Broggi introduced a multi-resolution vehicle detection approach, and proposed dividing the image into three fixed ROIs: one near the host car, one far from the host car, and one in the middle [2]. This approach overcomes the limit of only being able to detect a single vehicle in the predefined region of the image, but it needs to compute the symmetry axis, making it not real-time.
D. Gabor first proposed the 1D Gabor function in 1946 and J.G. Daugman ex- tended it to 2D later. In fact, a Gabor filter is a local bandpass filter that can reach the theoretical limit for the spatial domain and the frequency domain simultaneously. Consequently, Gabor filters have been successfully applied for object representation in various computer vision applications, such as texture segmentation and recognition [18], face recognition [19], scene recognition, and vehicle detection [14].
The basic issue of a Gabor filter is how to select the parameters of a filter that responds mainly to an interesting object, such as a vehicle or a pedestrian. Accurate detection only occurs if the parameters defining Gabor filters are well selected. Three main approaches have been proposed in the literature for selecting Gabor filters for object representation: manual selection, Gabor filter bank design (including filter design) [18], and a learning approach [13, 14, 16, 19]. In [1], Ilkka Autio
proposed an approach for manual selection: An initial set of Gabor filters were experimentally selected from a larger set and then manually tuned. In general, a Gabor filter bank design defines a small filter pool, and determines the parameters of its filters independent of the application domain; moreover, the bandwidth of those Gabor filter design approaches cannot be determined autonomously. In image browsing and retrieval, a strategy is used to ensure that the half-peak magnitude support of the filter responses in the frequency domain touch each other by using a filter bank with 6 directions and 4 scales to compute the features of a texture [12]. Due to independence of the filter bank and the application domain, such an approach can be used for object classification, detection and tracking. The main problems of this filter design approach are small filter pool sizes, no prior knowledge, and poor performance. Learning-based Gabor filter design approaches select the Gabor filters according to its application domain. Du-Ming Tsai proposed an optimization algorithm for Gabor filters using a simulated annealing approach to obtain the best Gabor filter in texture segmentation [16]. A face recognition application using a strong classifier cascaded by weak classifiers was proposed by S.Z. Li; in his approach, weak classifiers were constructed based on both the magnitude and phase features from Gabor filters [19].
In terms of vehicle detection, Alberto Broggi introduced a multi-resolution vehicle detection approach, and proposed dividing the image into three fixed ROIs [2]. His approach allows detecting multiple vehicles in a predefined region. How- ever, it uses a symmetry axis for detecting vehicles that is not only time-consuming to compute but symmetry features are somewhat problematic. In [14], Zehang Sun proposed an Evolutionary Gabor Filter Optimization (EGFO) approach for vehicle detection, and used the statistical features of the response of selected Gabor filters to classify the test image using a trained SVM classifier. Although good performance has been reported, EGFO has large computational cost for the selection of a Gabor filter. Moreover, each Gabor filter is optimized for a complete image, but it is applied to each sub-window of a test image, which reduces the quality of the representation.
The requirements of Vehicle Active Safety Systems (VASS) are strict with respect to the time performance for pedestrian detection and vehicle detection. Accordingly, in our approach we detect vehicles only in ROIs, allowing us to make a real-time implementation. The ROI approach largely prevents a near car from hiding a far car. All the hypotheses are generated in these regions. The positions of vehicles are validated by SVM classifiers and Gabor features.
5.3 Generating Candidate ROIs
Inspired by A. Broggio [2], we extract three ROIs: a near one, one in the middle, and a far one from a 640 × 480 image. But his approach uses fixed regions at the cost of flexibility. In our approach, ROIs are extracted using lane markings. In a structured lane, we detect the vanishing point using the lane edges. For the consideration of real-time processing, we use a simple vanishing point detector rather than a complex one. Discontinuity and noise related problems can be solved by combining, for instance, 10 subsequent images (see Fig. 5.1(a)). Edge detection is done on combined images
consisting of 10 overlapping subsequent images, and the equations of two lanes are deduced from a voting procedures like HT by analyzing horizontal and vertical edges. Four random points Pdi, d = l or r ; i= 0,..., 3, are selected on each lane line, and each tangent direction of two points (shown in (5.1)) between the closest 3 points
; or ;
; (5.1)
is obtained by
The tangent directions of two lane lines are calculated using the average value of the above tangent angles and are described by
, or l. (5.2)
Combining the average coordinates of 4 interesting points with the average tangent angles θˉd , we can get the equations of two lane lines. The intersection point of the two lines is an approximation of the vanishing point; see Fig. 5.2. Next we consider how to extract ROIs from the original image. For the consideration of vehicle height and the camera parameters, the top boundaries of all the ROIs are 10 pixels higher than the vertical coordinates of the vanishing point. From the analysis of the camera parameters and image resolution, the heights of the near, middle, and far ROIs are 160, 60, and 30 pixels, respectively. The left and right boundaries of the near ROI are those of the image. The distance between the left boundary of the middle ROI and that of the image is just one-third of the distance between the vanishing point and the left boundary of the image, and the right one of middle ROI is determined similarly. The distance between the left boundary of the far ROI and that of image is two-thirds of the distance between the vanishing point and the left boundary of image, as well as the distance between the right boundary of the far ROI and that of the image. Figure 5.2(b) shows the results of each ROI.
Fig. 5.2 Vanishing point and ROI generation
第5章
車輛檢測和跟蹤
5.1簡介
據(jù)統(tǒng)計表明,如果駕駛員有預(yù)防危險發(fā)生的能力,就能夠避免約60%的追尾事故。根據(jù)中國大陸的交通部門報道,2004年共有567753道路交通事故,在那些報告中大約80%的交通事故是汽車引起的。這些追尾事故所造成傷害的數(shù)據(jù)是所有事故數(shù)據(jù)的五分之二,導(dǎo)致死亡的超過2%。顯然,車輛檢測是智能交通系統(tǒng)的一個重要的研究領(lǐng)域[2,11,20]。除此之外,車輛檢測被用于自適應(yīng)巡航控制系統(tǒng)(ACC),駕駛輔助系統(tǒng),自動可視化交通監(jiān)控(AVTS),以及自導(dǎo)系統(tǒng)的車輛。然而,在檢測現(xiàn)實世界中的交通場景的車輛是具有挑戰(zhàn)性的。
目前,雷達(IDASW)系統(tǒng)的機器成本較高,而且有橫向分辨的能力。在巡航控制(ACC)系統(tǒng)中,攝像機可以從相鄰車道的車輛檢測更早切入并使用畫面來檢測。由于這些原因,此系統(tǒng)就更加難以應(yīng)用。因此,現(xiàn)實中的車輛檢測視頻受到了世界各地的研究者們的更多關(guān)注 [2,4,14]。
為了能夠及時檢測到道路上的車輛,本章引入了一種多分辨率結(jié)構(gòu)。由布羅基[2]的啟發(fā),我們提出了3個樣區(qū)點:從一個640×480的圖像中在近處的一個地方選取一個點,中間的一個點,和遠處的一個點。他的方法是使用固定的地區(qū)為代價的靈活性,我們?nèi)サ暨@個限制,并建立一個簡單而高效的結(jié)構(gòu)方案,其中包括下面描述的三個步驟:
1. 投資回報率判定:我們生成的投資回報率所使用圖像是樣區(qū)點的原始圖像。
2.車輛因為每個樣區(qū)點所使用水平和垂直方向的邊緣檢測情況不同而產(chǎn)生錯誤觀點:我們基于前述車輛候選區(qū)域分辨率創(chuàng)建假說,從邊緣直方圖的分析,我們將所生成的投資回報率合并之后得到一個單獨的列表。
3.使用伽柏特性假設(shè)驗證功能并用支持向量機分類:我們用9分窗口和支持向量機對車輛進行驗證。根據(jù)該向量機的判定,我們假設(shè)被測物體是表示車輛或非車輛。
5.2相關(guān)工作
使用一些簡單的特性,如顏色,水平或垂直邊緣,對稱性[2,5],運動和立體聲視覺線索生成假設(shè)。孫澤行提出了一種多重假說的方法,其中原始圖像以320×240,160×120和80×60的格式取樣。假設(shè)他的車輛通過這三個層次的水平和垂直邊緣的組合時,產(chǎn)生的這種多級方法分別使汽車大大降低了噪聲。那這種方法可以產(chǎn)生多個假設(shè)對象,而且是一個鄰近車輛可以防止被一個遠車輛檢測。但其結(jié)果還是失敗,原因是遠處的車輛檢出率特別低。
雷本等人在坐一個具有3D動態(tài)場景視頻系統(tǒng)[9] 的移動車輛上觀察到,該車具有集成場景幾何估計的功能,它不僅可以檢測到車輛和行人,而且可以進行3D定位和軌跡的估算。令人印象深刻的是,他的論文提出了一種用多視角/多類目標(biāo)檢測的方法檢測交通場景。此外,2D車輛行人畫面都可以轉(zhuǎn)換成三維觀察的模式來被檢測。
車輛的對稱性是在車輛檢測與跟蹤的一個重要線索。通過霍夫、杜悅等人提出,通過找到一個車輛[5]的對稱軸線提出以下很多問題;然而,他們的方法有一定限制,比如計算負擔(dān)重大,而且只生成一個對象來假設(shè)。阿爾貝托·布羅吉特引入了多分辨率的車輛檢測方法,并提出了將圖像中取三個參考點:一個是靠近主機的位置點,一是遠離主機的位置點和在中間的一個點[2]。此方法克服了僅能夠檢測單一區(qū)域車輛的局限性,但它還是需要計算對稱軸,這使得它不是實用的。
D.伽柏在1946年與道格交換版本權(quán)之后首次提出了一維伽柏函數(shù)。事實上,伽柏濾波器是一個當(dāng)?shù)氐臑V波器,而且它的頻率即將達到理論極限。因此,伽柏濾波器已成功地應(yīng)用于各種計算機系統(tǒng)當(dāng)中,如紋理分割和識別方面[18],人臉識別[19],場景識別和車輛檢測[14]中都能夠用到。
伽柏濾波器的基本問題是如何將車輛或行人經(jīng)濾波器響應(yīng)來產(chǎn)生主要的。精確的檢測性能只有在確定伽柏濾波器的參數(shù)是否符合實際參數(shù)才能體現(xiàn)。伽柏已經(jīng)在文獻中提出了三種主要的方法來表示:手動選擇,伽柏濾波器組設(shè)計(包括荷蘭國際集團濾波器設(shè)計)[18],以及學(xué)習(xí)方法[13,14,16,19]。在[1]中,伊爾卡里斯蒂建議使用手動選擇的方法:首先使用最初的一個伽柏濾波器進行實驗選定,然后手動調(diào)整。在一般情況下,每一個伽柏濾波器組設(shè)計了一個小過濾池,并確定它的過濾器獨立應(yīng)用程序的參數(shù);此外,這些伽柏濾波器設(shè)計方法不能自主決定。在圖像瀏覽和檢索中,圖像是用來確保半峰級濾波器在通過使用一個具有6個向量的濾波器組來體現(xiàn)紋理的特征,使得濾波相互接觸[12 ]。由于濾波器組和應(yīng)用領(lǐng)域的獨立性可用于濾波的檢測和跟蹤方法。該過濾器設(shè)計方法的主要問題在于小型過濾池的大小的設(shè)定,事先沒有了解,所以使用情況不好。根據(jù)伽柏濾波器的應(yīng)用領(lǐng)域來學(xué)習(xí)伽柏濾波器設(shè)計的方法。杜銘仔提出的優(yōu)化算法的伽柏濾波器是采用模擬退火方法來獲得紋理分割[16]的最佳伽柏濾波器。在他的方法中弱分類器是基于兩個從伽柏濾波器的幅度和相位[19]構(gòu)成的特點來使用的。
在車輛檢測方面,阿爾貝托布羅吉特引入多分辨率的車輛檢測方法,并提出了將圖像劃分為三個固定的樣區(qū)點[2]。他的方法允許在預(yù)定區(qū)域檢測多輛車。然而,它使用了用于檢測一個車輛對稱軸的方法,這樣不僅計算費時,而且對稱方面也有些問題。在孫澤行提出在車輛檢測中用一個進化伽柏濾波器進行優(yōu)化(EGFO)的方法,和使用所選的伽柏濾波器響應(yīng)的統(tǒng)計特性用以系統(tǒng)虛擬機測試圖像來進行分類。最后良好的性能得以體現(xiàn),EGFO有大量的計算成本的伽柏濾波器來選擇。此外,每個伽柏濾波器為每一個完整的圖像進行了優(yōu)化,并將它應(yīng)用到測試圖像當(dāng)中,從而減少了原有的缺點。
汽車主動安全系統(tǒng)(VASS)的性能要求是保證行人檢測和車輛檢測的時間充足。 AC-科丁在我們的方法中發(fā)現(xiàn),我們只在固定的區(qū)域中檢測車輛,并做一個實驗來確定。他的方法很大程度阻止自己的車與附近或者遠處隱藏的車相撞。都是在這些區(qū)域中生成所有的假設(shè)。車輛的位置驗證了支持向量機分類器的功用和特性。
5.3生成候選的投資回報率
由A.的布羅根[2]的啟發(fā),我們提取出3個樣區(qū)點:在640×480的圖像中一個鄰近的位置點,一個在中間,和一個來自遠處的位置點。但是,他的方法是使用固定區(qū)域的靈活性為代價的。在我們的方法中,投資回報率是使用車道標(biāo)記位置點提取的。在結(jié)構(gòu)化車道中,我們使用的是車道邊緣檢測的消失點。對于實時處理的考慮,我們用一個簡單的消失點檢測,而不是復(fù)雜的一個點。可以通過各方法來解決不連續(xù)性和噪聲有關(guān)的問題,
(a)使單幀車道邊緣 (b)重疊的車道
例如,在10個后續(xù)圖像中(見圖5.1(a))。結(jié)合邊緣檢測圖上的10個消失點,以及從一個表決程序圖,通過分析水平和垂直邊緣推導(dǎo)出的兩種車道方程。提供出四個隨機點,D = L或R; I = 0,...,3,是在每個車道線上任意選擇的,并且在每個切線方向上兩個點之間接近3點(在(5.1示出))
; or l;
; (5.1)
是此獲得
平均值計算雙車道線的切線方向所使用的上述切線的角度,得
, or l. (5.2)
我們可以結(jié)合平均值坐標(biāo)得出兩個車道線的方程為。在十字路口的這兩條線的交點是消失點;參照圖5.2。下一步,我們所要考慮的是如何從原始圖像中提取出投資回報率。用于車輛的高度和攝像機的參數(shù),所有的感應(yīng)區(qū)域的頂部邊界比消失點的垂直坐標(biāo)高10像素。從照相機參數(shù)和圖像分辨率分析,近、中和遠的樣區(qū)160,60和30的像素的區(qū)別。得到左和右邊界附近的樣區(qū)圖像。中間樣區(qū)的左邊界和該圖像之間的距離僅是消失點和圖像的左側(cè)邊界之間的距離的三分之一,而中間樣區(qū)被類似地確定為標(biāo)準(zhǔn)點。遠處樣區(qū)的左邊界和該圖像之間的距離是消失點和圖像左邊界之間的距離的三分之二,以及遠處樣區(qū)的右邊界和左邊界之間的距離圖片也是這樣。圖5.2(b)表示每個投資回報率的結(jié)果。
(a)兩條線之間的焦點(b)部門的投資回報率(c)車道一帶的投資回報率
圖 5.2 消失點和投資回報率的產(chǎn)生