Month 2 - Image Gathering, Filtering, and Cleaning
By: Johnathan Coots
For month 2 I focused on the gathering of images via Mapillary API, filtering out indoor images via Places365, and inpainting dynamic objects via RF-DETR image segmentation and LaMa inpainting.
Using the Mapillary API was a little difficult for me due to my inexperience with web sessions, but their documentation made the requesting understandable. Essentially, you ask for all the image information in a box that is dictated by the minimum and maximum latitudes and longitudes of the desired area. That returns a list of metadata that includes the associated image IDs. You then send a follow-up request to download the images as JSON objects via their image ID. The only caveats to this process are the limit to area size (0.01 square-geographic-degrees, which is huge) and the maximum image count of 2000 images. I did run into some confusion with the wording about pagination of responses regarding the 2000 image limit. Initially I thought the types of requests I would be sending would require me to follow up with a “Next Page” response, but after some careful reading of the documentation I believe this is only necessary if a sub-user is accessing the API via my access token.
The Places365 implementation went smoothly due to previous experience with the repository, and I am successfully filtering indoor images from the data set.
I am currently running into some difficulties with the LaMa inpainting. During the implementation of the LaMa repository, I ran into some package conflicts. To try and avoid this I tried using an alternative repository, but the results have been subpar. A potential issue is that I am using a RF-DETR segmentation model when I previously used an Ultralytics one, but I do not believe this is the issue so far. I have inspected the image masks RF-DETR model created and they seem accurate. Instead, I will return to the original LaMa model I planned on using and try to resolve the package conflicts.
Once the inpainting has been improved I plan on implementing the MiDaS depth estimation. This will lead to the object matching and triangulation portions of the pipeline. My major goals for month 3 are to implement the object matching, triangulation, and NeRF portions of the pipeline so that I can spend month 4 focusing on paper work a refinement of the application.