We (Myself – Theo Coulson, along with Andnet Deboer and Derek Dietz) built a system which uses a 7-dof Franka Emika Panda robot arm to place 1:87 (HO scale) model railway cars.

The Problem:

Placing HO scale trains onto track is a difficult task for many humans. For context, a HO scale train is this big:

One of the models used in this project, with a hand and cutting mat for scale.

These trains work just like the real ones: to properly run on the tracks, the train must be placed so that the wheels are precisely aligned with the rails. Most miniature wheels are built such that they must be placed within ~2mm of the exact center to run properly. Further, just as on full-scale trains, the wheels of many model trains can pivot independently from the car body, and require separate adjustment.

Our Setup:

The approach we settled on was to use a single camera for all input. We would use the locomotive as a landmark for track identification, then scan an image of the track to get the position and orientation of the track. We would then scan a staging area where the cars (and the controller for the setup) would be placed. Next, pick up and place each car onto the track, using the position data from computer vision and hardcoded measurements for each car. The user would initiate this process using a ROS2 service call.

Hardware:

To manipulate the cars (and carry the camera), we used the Franka Emika Panda arm with its stock gripper. The fingers of the gripper were replaced with a custom PLA design in order to grip the car properly. Our custom fingers were designed in Onshape, and you can view them here.

Our camera was a RealSense D450 camera, which provides both RGB and depth information, enabling accurate localization of points in its image.

The locomotive was a Piko DB class 191. Our test cars for railing were a Lionel NYC caboose and a 40ft reefer of unknown make. Our track was steel Bachmann E-Z track. The controller was a Kato analog controller.

Computer vision:

Our computer vision system had three major subsystems: train identification, track localization, and train localization.

Train identification was done using Ultralytics’ YOLO system. To generate training data to fine-tune the YOLO model on our trains, we took videos of the cars using the Franka-mounted realsense camera. We then used Grounding Dino and Meta’s SAM 2 to produce approximate bounding boxes from text descriptions of each car. YOLO produced minimum-area bounding boxes for every car in the frame; this allowed us to determine the location and orientation of the cars with reasonable accuracy.

Localizing the track was one of the most sensitive measurements in this project, as while our approach allowed some latitude for error when picking up the train, we relied heavily on our track position being extremely accurate. Fortunately, the tracks were much less visually complicated than the trains, and so non-ML-based identification was possible. Our track identification used the contrast between the steel rails and black backing plastic to identify contours corresponding to the rail, and then find the midpoint line. This produced a highly stable set of points on which we could use to determine the position and orientation of the set of track.

The principal drawback of this system was its sensitivity to the camera location relative to the track; if the track was not centered in the frame, the output would be skewed. To correct for this, we took two track positions – the first was based on the locomotive’s position, and was used to make fine adjustments to the camera in order to ensure that the track was as centered as possible in the final scan.

A similar 2-part process ended up being essential for localizing the cars with sufficient accuracy. In order to properly adjust each set of axles to be properly aligned with the track, the center of each car needed to be found with greater precision than was possible with YOLO. To improve centering, we used the RealSense’s depth image, in which each car appeared as an easy-to-isolate contour.

Motion Planning:

Once we had the positions of everything in the scene following our scan process, the actual manipulation of the cars was relatively straightforward. Once they were picked up by the gripper, we would place the center of the car on the track. Due to the rotational freedom of the wheels relative to the car body, this rarely resulted in all wheels being properly engaged. We would then have the arm grab and slightly raise each set of axles before squeezing them in order to force them into alignment with the track.

This stage of the project proved to be our ultimate bottleneck – while we were regularly able to get close to perfect placements, we very frequently found that this final adjustment stage either did not meaningfully change the orientation of the wheels, or resulted in them being further from the proper position than they started. Despite extensive tweaking (including modifications to the fingers), we were not able to make this step work reliably in the time we had. Further development – particularly, improving the padding used on the ends of the fingers to avoid scratching the cars – could likely render this step far more robust.

Choo Choo!