Robot companion localization at home and in the officeArnoud Visser J¨urgen Sturm Frans GroenIntelligent Autonomous Systems, Universiteit van Amsterdamhttp://www.science.uva.nl/research/ias/AbstractThe abilities of mobile robots depend greatly on the performance of basic skills such asvision and localization. Although great progress has been made to explore and map extensivepublic areas with large holonomic robots on wheels, less attention is paid on the localizationof a small robot companion in a confined environment as a room in office or at home. Inthis article, a localization algorithm for the popular Sony entertainment robot Aibo inside aroom is worked out. This algorithm can provide localization information based on the naturalappearance of the walls of the room. The algorithm starts making a scan of the surroundings byturning the head and the body of the robot on a certain spot. The robot learns the appearanceof the surroundings at that spot by storing color transitions at different angles in a panoramicindex. The stored panoramic appearance is used to determine the orientation (including aconfidence value) relative to the learned spot for other points in the room. When multiplespots are learned, an absolute position estimate can be made. The applicability of this kind oflocalization is demonstrated in two environments: at home and in an office.1 Introduction1.1 ContextHumans orientate easily in their natural environments. To be able to interact with humans, mobilerobots also need to know where they are. Robot localization is therefore an important basic skillof a mobile robot, as a robot companion like the Aibo. Yet, the Sony entertainment softwarecontained no localization software until the latest release1. Still, many other applications for arobot companion - like collecting a news paper from the front door - strongly depend on fast,accurate and robust position estimates. As long as the localization of a walking robot, like theAibo, is based on odometry after sparse observations, no robust and accurate position estimatescan be expected.Most of the localization research with the Aibo has concentrated on the RoboCup. At theRoboCup2 artificial landmarks as colored flags, goals and field lines can be used to achieve localizationaccuracies below six centimeters [6, 8].The price that these RoboCup approaches pay is their total dependency on artificial landmarksof known shape, positions and color. Most algorithms even require manual calibration of the actualcolors and lighting conditions used on a field and still are quite susceptible for disturbances aroundthe field, as for instance produced by brightly colored clothes in the audience.The interest of the RoboCup community in more general solutions has been (and still is) growingover the past few years. The almost-SLAM challenge3 of the 4-Legged league is a good example ofthe state-of-the-art in this community. For this challenge additional landmarks with bright colorsare placed around the borders on a RoboCup field. The robots get one minute to walk around andexplore the field. Then, the normal beacons and goals are covered up or removed, and the robotmust then move to a series of five points on the field, using the information learnt during the first1Aibo Mind 3 remembers the direction of its station and toys relative to its current orientation2RoboCup Four Legged League homepage, last accessed in May 2006, http://www.tzi.de/4legged3Details about the Simultaneous Localization and Mapping challenge can be found at http://www.tzi.de/4legged/pub/Website/Downloads/Challenges2005.pdf1minute. The winner of this challenge [6] reached the five points by using mainly the information ofthe field lines. The additional landmarks were only used to break the symmetry on the soccer field.A more ambitious challenge is formulated in the newly founded RoboCup @ Home league4. Inthis challenge the robot has to safely navigate toward objects in the living room environment. Therobot gets 5 minutes to learn the environment. After the learning phase, the robot has to visit 4distinct places/objects in the scenario, at least 4 meters away from each other, within 5 minutes.1.2 Related WorkMany researchers have worked on the SLAM problem in general, for instance on panoramic images[1, 2, 4, 5]. These approaches are inspiring, but only partially transferable to the 4-Legged league.The Aibo is not equipped with an omni-directional high-quality camera. The camera in the nosehas only a horizontal opening angle of 56.9 degrees and a resolution of 416 x 320 pixels. Further,the horizon in the images is not a constant, but depends on the movements of the head and legs ofthe walking robot. So each image is taken from a slightly different perspective, and the path of thecamera center is only in first approximation a circle. Further, the images are taken while the headis moving. When moving at full speed, this can give a difference of 5.4 degrees between the top andthe bottom of the image. So the image seems to be tilted as a function of the turning speed of thehead. Still, the location of the horizon can be calculated by solving the kinematic equations of therobot. To process the images, a 576 Mhz processor is available in the Aibo, which means that onlysimple image processing algorithms are applicable. In practice, the image is analyzed by followingscan-lines with a direction relative the calculated horizon. In our approach, multiple sectors abovethe horizon are analyzed, with in each sector multiple scan-lines in the vertical direction. One ofthe general approaches [3] divides the image in multiple sectors, but this image is omni-directionaland the sector is analyzed on the average color of the sector. Our method analysis each sector ona different characteristic feature: the frequency of colortransitions.2 ApproachThe main idea is quite intuitive: we would like the robot to generate and store a 360o circularpanorama image of its environment while it is in the learning phase. After that, it should aligneach new image with the stored panorama, and from that the robot should be able to derive itsrelative orientation (in the localization phase). This alignment is not trivial because the new imagecan be translated, rotated, stretched and perspectively distorted when the robot does not stand atthe point where the panorama was originally learned [11].Of course, the Aibo is not able (at least not in real-time) to compute this alignment on fullresolutionimages. Therefore a reduced feature space is designed so that the computations becometractable5 on an Aibo. So, a reduced circular 360o panorama model of the environment is learned.Figure 1 gives a quick overview of the algorithm’s main components.The Aibo performs a calibration phase before the actual learning can start. In this phase theAibo first decides on a suitable camera setting (i.e. camera gain and the shutter setting) basedon the dynamic range of brightness in the autoshutter step. Then it collects color pixels byturning its head for a while and finally clusters these into 10 most important color classes in thecolor clustering step using a standard implementation of the Expectation-Maximization algorithmassuming a Gaussian mixture model [9]. The result of the calibration phase is an automaticallygenerated lookup-table that maps every YCbCr color onto one of the 10 color classes and cantherefore be used to segment incoming images into its characteristic color patches (see figure 2(a)).These initialization steps are worked out in more detail in [10].4RoboCup @ Home League homepage, last accessed in May 2006, http://www.ai.rug.nl/robocupathome/5Our algorithm consumes per image frame approximately 16 milliseconds, therefore we can easily process imagesat the full Aibo frame rate (30fps).Figure 1: Architecture of our algorithm(a) Unsupervised learned color segmentation.(b) Sectors and frequent color transitionsvisualized.Figure 2: Image processing: from the raw image to sector representation. This conversion consumesapproximately 6 milliseconds/frame on a Sony Aibo ERS7.2.1 Sector signature correlationEvery incoming image is now divided into its corresponding sectors6. The sectors are located abovethe calculated horizon, which is generated by solving the kinematics of the robot. Using the lookuptable from the unsupervised learned color clustering, we can compute the sector features by countingper sector the transition frequencies between each two color classes in vertical direction. This yieldsthe histograms of 10x10 transition frequencies per sector, which we subsequently discretize into 5logarithmically scaled bins. In figure 2(b) we displayed the most frequent color transitions for eachsector. Some sectors have multiple color transitions in the most frequent bin, other sectors have asingle or no dominant color transition. This is only visualization; not only the most frequent colortransitions, but the frequency of all 100 color transitions are used as characteristic feature of thesector.In the learning phase we estimate all these 80x(10x10) distributions7 by turning the head andbody of the robot. We define a single distribution for a currently perceived sector byPcurrent (i, j, bin) =_1 discretize (freq (i, j)) = bin0 otherwise(1)where i, j are indices of the color classes and bin one of the five frequency bins. Each sector isseen multiple times and the many frequency count samples are combined into a distribution learned680 sectors corresponding to 360o; with an opening angle of the Aibo camera of approx. 50o, this yields between10 and 12 sectors per image (depending on the head pan/tilt)7When we use 16bit integers, a complete panorama model can be described by (80 sectors)x(10 colors x 10colors)x(5 bins)x(2 byte) = 80 KB of memory.for that sector by the equation:Plearned (i, j, bin) = Pcountsector (i, j, bin)bin2frequencyBinscountsector (i, j, bin)(2)After the learning phase we can simply multiply the current and the learned distribution to getthe correlation between a currently perceived and a learned sector:Corr(Pcurrent, Plearned) =Yi,j2colorClasses,bin2frequencyBinsPlearned (i, j, bin) ·Pcurrent (i, j, bin) (3)2.2 AlignmentAfter all the correlations between the stored panorama and the new image signatures were evaluated,we would like to get an alignment between the stored and seen sectors so that the overall likelihoodof the alignment becomes maximal. In other words, we want to find a diagonal path with theminimal cost through the correlation matrix. This minimal path is indicated as green dots in figure3. The path is extended to a green line for the sectors that are not visible in the latest perceivedimage.We consider the fitted path to be the true alignment and extract the rotational estimate 'robotfrom the offset from its center pixel to the diagonal (_sectors):?'robot =360_80_sectors (4)This rotational estimate is the difference between the solid green line and the dashed white linein figure 3, indicated by the orange halter. Further, we try to estimate the noise by fitting again apath through the correlation matrix far away from the best-fitted path.SNR =P(x,y)2minimumPathCorr(x, y)P(x,y)2noisePathCorr(x, y)(5)The noise path is indicated in figure 3 with red dots.(a) Robot standing on the trained spot (matchingline is just the diagonal)(b) Robot turned right by 45 degrees (matchingline displaced to the left)F igure 3: Visualization of the alignment step while the robot is scanning with its head. Thegreen solid line marks the minimum path (assumed true alignment) while the red line marks thesecond-minimal path (assumed peak noise). The white dashed line represents the diagonal, whilethe orange halter illustrates the distance between the found alignment and the center diagonal(_sectors).2.3 Position Estimation with Panoramic LocalizationThe algorithm described in the previous section can be used to get a robust bearing estimatetogether with a confidence value for a single trained spot. As we finally want to use this algorithmto obtain full localization we extended the approach to support multiple training spots. Themain idea is that the robot determines to which amount its current position resembles with thepreviously learned spots and then uses interpolation to estimate its exact position. As we thinkthat this approach could also be useful for the RoboCup @ Home league (where robot localizationin complex environments like kitchens and living rooms is required) it could become possible thatwe finally want to store a comprehensive panorama model library containing dozens of previouslytrained spots (for an overview see [1]).However, due to the computation time of the feature space conversion and panorama matching,per frame only a single training spot and its corresponding panorama model can be selected.Therefore, the robot cycles through the learned training spots one-by-one. Every panorama modelis associated with a gradually changed confidence value representing a sliding average on the confidencevalues we get from the per-image matching.After training, the robot memorizes a given spot by storing the confidence values received fromthe training spots. By comparing a new confidence value with its stored reference, it is easy todeduce whether the robot stands closer or farther from the imprinted target spot.We assume that the imprinted target spot is located somewhere between the training spots.Then, to compute the final position estimate, we simply weight each training spot with its normalizedcorresponding confidence value:positionrobot =XipositioniPconfidenceij confidencej(6)This should yield zero when the robot is assumed to stand at the target spot or a translationestimate towards the robot’s position when the confidence values are not in balance anymore.To prove the validity of this idea, we trained the robot on four spots on regular 4-Legged fieldin our robolab. The spots were located along the axes approximately 1m away from the center.As target spot, we simply chose the center of the field. The training itself was performed fullyautonomously by the Aibo and took less than 10 minutes. After training was complete, the Aibowalked back to the center of the field. We recorded the found position and kidnapped the robot toan arbitrary position around the field and let it walk back again.Please be aware that our approach for multi-spot localization is at this moment rather primitiveand has to be only understood as a proof-of-concept. In the end, the panoramic localization datafrom vision should of course be processed by a more sophisticated localization algorithm, like aKalman or particle filter (last not least to incorporate movement data from the robot).3 Results3.1 EnvironmentsWe selected four different environments to test our algorithm under a variety of circumstances. Thefirst two experiments were conducted at home and in an office environment8 to measure performanceunder real-world circumstances. The experiments were performed on a cloudy morning, sunnyafternoon and late in the evening. Furthermore, we conducted exhaustive tests in our laboratory.Even more challenging, we took an Aibo outdoors (see [7]).3.2 Measured resultsFigure 4(a) illustrates the results of a rotational test in a normal living room. As the error in therotation estimates ranges between -4.5 and +4.5 degrees, we may assume an error in alignment ofa single sector9; moreover, the size of the confidence interval can be translated into maximal twosectors, which corresponds to the maximal angular resolution of our approach.8XX office, DECIS lab, Delft9full circle of 3600 divided by 80 sectors(a) Rotational test in natural environment (livingroom, sunny afternoon)(b) Translational test in natural environment (child’sroom, late in the evening)Figure 4: Typical orientation estimation results of experiments conducted at home. In the rotationalexperiment on the left the robot is rotated over 90 degrees on the same spot, and every 5 degrees itsorientation is estimated. The robot is able to find its true orientation with an error estimate equalto one sector of 4.5 degrees. The translational test on the right is performed in a child’s room. Therobot is translated over a straight line of 1.5 meter, which covers the major part of the free spacein this room. The robot is able to maintain a good estimate of its orientation; although the errorestimate increases away from the location where the appearance of the surroundings was learned.Figure 4(b) shows the effects of a translational dislocation in a child’s room. The robot wasmoved onto a straight line back and forth through the room (via the trained spot somewhere in themiddle). The robot is able to estimate its orientation quite well on this line. The discrepancy withthe true orientation is between +12.1 and -8.6 degrees, close to the walls. This is also reflected inthe computed confidence interval, which grows steadily when the robot is moved away from thetrained spot. The results are quite impressive for the relatively big movements in a small room andthe resulting significant perspective changes in that room.Figure 5(a) also stems from a translational test (cloudy morning) which has been conducted inan office environment. The free space in this office is much larger than at home. The robot wasmoved along a 14m long straight line to the left and right and its orientation was estimated. Notethe error estimate stays low at the right side of this plot. This is an artifact which nicely reflectsthe repetition of similarly looking working islands in the office.In both translational tests it can be seen intuitively that the rotation estimates are withinacceptable range. This can also be shown quantitatively (see figure 5(b)): both the orientationerror and the confidence interval increase slowly and in a graceful way when the robot is movedaway from the training spot.Finally, figure 6 shows the result of the experiment to estimate the absolute position with multiplelearned spots. It can be seen that the localization is not as accurate as traditional approaches,but can still be useful for some applications (bearing in mind that no artificial landmarks are required).We recorded repeatedly a derivation to the upper right that we think can be explained bythe fact that different learning spots don’t produce equally strong confidence values; we believe tobe able to correct for that by means of confidence value normalization in the near future.4 ConclusionAlthough at first sight the algorithm seems to rely on specific texture features of the surroundingsurfaces, in practice no dependency could be found. This can be explained by two reasons: firstly, asthe (vertical) position of a color transition is not used anyway, the algorithm is quite robust against(vertical) scaling. Secondly, as the algorithm aligns on many color transitions in the background(typically more than a hundred in the same sector), the few color transitions produced by objectsin the foreground (like beacons and spectators) have a minor impact on the match (because theirsizes relative to the background are comparatively small).The lack of an accurate absolute position estimates seems to be a clear drawback with respect tothe other methods, but bearing information alone can already be very useful for certain applications.(a) Translational test in natural environment (office,cloudy morning)(b) Signal degradation as a function of the distance tothe learned spot (measured in the laboratory)Figure 5: Challenging orientation results. On the left a translational test in office environmentover 14 meters along a line 80 centimeters from the learned spot (only one). A translation tothe left of the office increases the error estimate increases, as expected. When translating to theright of the office to the orientation estimate oscillates, but the error estimate stays low. This isdue to repeating patterns in the office, after 4 meters there is another group of desks and chairswhich resem