Hello everyone! Have you ever wondered how do robots navigate? Then you are at the right place now to discuss on the basics of robotic mapping. Today we will dive into an algorithm which deals more about the process of robotic mapping and „SLAM“ – sequential localisation and mapping. I am Sprince, a hobby roboticist.
Imagine you are outside, visiting a new city. Let’s take for an example Linz here. And you would like to explore the city without the use of any navigation tools or maps in this case. The first thing that you do is, you get out of the center station – Linz Hauptbahnhof – and when you come out of the station, this is something similar to what you would look at. The moment you look at a picture or a view like this, the 1 and the 2 mentioned here on the pictures could be one of the landmarks that your brain chooses. These are a tall building and the other one is a commonly seen thing, a bus stop. Here it’s „Wissensturm“ and „bus stop“ that my brain chose as a landmark for instance. Now I move further away for a few meters and this could be the second observation. Again, the same bus stop 3 and 4 is a tall building, an apartment. What my brain basically does is, it tries to fuse these images together to create a map of the city in this case. Here, how can it be done? It could be similar to this. Here you can see that the bus stop in the image – first and second – is grouped together into one, thereby creating the map of this city from two images. When the process continues or iterates over a period of time, walking throughout the city, you could get a map of the whole city. This is something that our brain process and creates the map of. So now I know that when I come out of the train station, there is a bus stop opposite to the train station. I’m a person with poor memory, so imagine I take ten days to explore the city. After this, I go back to my hometown and I come back again after some days to reach a destination which is Ars Electronica in this case. Unfortunately my phone battery is out, so I can’t use the navigation tools now and I don’t prefer using the oldschool maps. So in this case I already explored the city, so I have a map of the city in my brain – I’m trying to use this now. When I start navigating, the first thing my brain does, it likes to localize myself. In this case, the first image that we saw was the Bahnhof. I want to localize myself – what is the distance between my location and the Bahnhof in this case. Once I come out of the train station, I would like to know the difference between my location and the bus station. And with that, approximately, I can localize myself in the map that my brain created in my previous trips. And now I can actually find the location of Ars Electronica, which is also predicted by my map. And in this manner, I can actually create a path with the use of my brain. To actually confirm it, there are a lot of landmarks and features that come across my route. So let’s say I start my journey to Ars Electronica, my brain tries to compare the landmarks – like the bus stops, other buildings – with the map, it compares the map with the things I’m looking at. My eyes are actually acting as a sensor, so that it confirms my predicted value and my observed value. This is how I could reach my destination. This is one approach, which I would use to reach a location in real time.
But how about robots? Do you even wonder why I’m talking about city exploration in this case? Well, the reason is to know if robots can do the same. And fortunately: Yes, it’s possible for robots to do that! We have eyes, which act as a sensor to observe and perceive the environment. In this case, there is a Lidar sensor, or an RGBD camera, which is commonly used for robots to solve such problems. Each of these sensors have their own advantages and disadvantages – recently there is a lot of development that is going on in the RGBD cameras, because cameras are cheap, compared to other sensoric devices.
Today, let’s discuss about an introduction or an overview about one such approach to solve this SLAM problem. SLAM is nothing but sequential localisation and mapping, so you’re trying to localize at the same point and trying to map the environment. Just like how I tried to map the city Linz. In such an approach, „RTAB SPLAM“ stands for „real time appearance based sequential planning, localisation and mapping.“ So this is the next level of SLAM where we are planning the localisation and the mapping at the same time. Here, as you can see with this algorithm, the inputs are a RGBD camera, TFs, odometry data that come from the controllers of the drives. These datas are synchronized together with the timestamps and then they are sent to the short term memory – this is the working memory that the robot uses, like the RAM in a PC – then it sends it to the loop closure. The loop closure is nothing but the same thing we did earlier with the pictures when I was explaining about an example of city exploration – so there are two images that have common features, so now their overlapping features are actually taken together and fused, so we get a map of the environment. This is how a global map is formed and here the point of interest are 2D occupancy grid map and the octo map, which is a three dimensional map. I’m more focused on a three dimensional map here for our demo today. This is what we expect at the end.
So let’s try it out! Here on the left we can see this is a demo environment. Unfortunately we can’t map the whole city, since we a have limited time restrictions. On the right you can see the robot’s view of the map. This red line that you can see are actually the laser-scan data. But for this demo purpose here we are not using the laser scan, we are just using an RGBD camera. Here you can see a video of the robot mapping the whole environment. This is a simple demo of the RTAB SLPAM mapping process. On the left is the environment, as I told earlier, that we are interested to map. On the right you can see the map that is being created by the robot. As the robot moves, the map is actually updated on the right side. On the center bottom, we could see the depth images and the RGB images. This is the view that the vehicle can look at, so basically we are looking from the vehicle’s eye. As I told earlier, there is a laser scanner that is involved, but we are not using it. The robot that is used here is a turtlebot with a laser scanner and a RGBD camera, but for demo purposes since we are using a RGBD camera, anything that involves lasers is noise. As you can see in the camera data, there is a red line that is crossing – this is actually a laser scan, which is considered as noise in this case. The robot cannot map the blindspots – as you can see – while it is moving. The blindspots are the ones that are with a little bit of a dark-grey area. And these blindspots can be filled when the robot moves near the objects, just like the ones that you can see here, where the map is being updated as the robot moves near the line. Now it looks as if the robot is lost in the line – it’s not that it’s lost, but the map created here is a three dimensional map and it’s traveling behind the wall in trying to map the whole environment. Now I’m trying to – due to time restrictions – map one pole that you could see, the bottom center pole, to check the performance of the algorithm, to see if this is really good with respect to the cameras. As you can see, once it moves to the side, the boundaries are also getting filled slowly. This tells you that, even if there are some blindspots, as soon as the camera goes near, it just gets updated. Another advantage of using such an algorithm is – let’s assume that one of the poles here is a temporary pole, today. Tomorrow, if it changes for instance, this map can correct itself and this is how the algorithm itself is working. This is the main advantage of using such an algorithm for creating the map of the environment – there is no need to replace or make changes in the future whenever the vehicle moves. As you can see here in the center, the center bottom pole is mapped perfectly. There is a gap inbetween, which is because the pole is rigid. It’s – again – a blindspot for the cameras of the vehicle. This is the 3D representation of the map that we created now.
Now, we are to the last conclusion part of our presentation. There are different approaches of solving SLAM problems, this is one such approach, which is more flexible – I would say. because it allows you to use different sensors. In this case I can use multiple cameras for instance, depth cameras, I can use stereo cameras with depth cameras, laser scanners, Lidars, IMU units, … all this data could be fused together and synchronized to create such maps. It also allows us to compare the performance of different sensoric devices, to know which one performs better with this algorithm. The key advantage is its ability to change or to correct the map in a dynamic environment. As I told earlier, today I’m standing here and if there is a robot now mapping this environment, it could consider me as an obstacle. But tomorrow I won’t be here. So if the robot moves tomorrow in the same environment, it knows that I am a temporary obstacle and it deletes my space from the map. This is the main advantage of using this RTAB SLAMS. Last but not least, working with robots is not fuzzy to understand, we understand the logic behind it. It’s pretty simple, compared to a human brain.
So thank you very much for your time! This is Sprince Jeberson signing off. If you have more questions, feel free to write me an email or contact me on LinkedIn. Thank you!