GSoC 2025 - Closing Comments and Final Submission
Farewell!
This is my final blog post for the GSOC 2025 project developing a routing engine for pysal.spopt. I’ll link here to my pull request for the new feature and summarize some of what I did this summer below, but first I will describe some key skills I was able to develop through this work.
Key Lessons from GSOC 2025
Before I summarize my work, I’ll briefly list key lessons that I took away this summer.
Object-Oriented Programming
My pythonic cantrips have been greatly improved through this work. While I’ve always had an affinity for computers, my skills in python have come only recently in my academic career. I’ve always developed ‘as needed’ for my work, which has meant lengthy for-loops to process and plot census data, only coming to analytical methods in python in the last few years. This summer, I feel like I really took a leap forward in terms of my understanding of object-oriented programming.
Keyword Handling
A specific pain point was learning about how python class objects enable different keyword argument schemes. Because my mentors and the PySAL ecosystem is full of sharp developers, I had many opportunities to learn not just about how class objects can handle different argument schemes, but best practices for how to set up these method signatures for different purposes. This was super important when developing spopt.route because I needed to figure out how to handle both a router-provided and a no-router-provided case. This meant handling multiple routers with different additional keyword requirements, as well as accounting for cases where no keywords are provided.
Git Juggling and Environment management
These are two essential bits of collaborative development that sit in the shadow of a project’s actual codebase. For this project, I had to augment my comfortable development environment through conda-forge by using pip, which was necessary for installing routingpy. I eventually needed to clone the github repository for this library anyways, which was another layer of management. I already had an idea of how useful it could be to maintain an environment.yml file for a given project, but this work really solidified that for me. As well, this was the largest project that I have ever contributed to, and I had to be careful to manage my commit history and be precise in how I integrated my code into the larger spopt library. After having used different parts of PySAL throughout my graduate academic career, it’s been a real honor to be able to finally contribute!
AI
Permit me, for a moment, to opine on something topical. In the not-too-distant past, AI acolytes were prognosticating the coming of AGI, artificial general intelligence. Each person was going to have their own Jarvis. Agentic AI was promised to fill any gaps left in our busy lives, ensuring that we never missed an appointment, that we would have proper heads-up for any difficulties we might encounter, that we could ask it to buy concert tickets and it would not only buy the tickets, but schedule it for us and send messages to our friends confirming that we would be in attendance. Instead, AI capacity appears to have plateaued. Sam Altman himself recently suggested the to the industry is a bubble. Just another technological development that has been over-hyped, over-invested in, and is threatening to upend society.
Beyond concerns of environmental impacts, there are concerns about the impacts to human societies - our attention spans, our patience to learn organically rather than having answers dispensed from a machine. These are not unfounded concerns. In my capacity as an instructor at a California State University, I have seen firsthand students avoid workloads such that they frictionlessly bypass entire blocks of the course, ensuring that they retain nothing and leave the class feeling like it was a waste of time (and money). Additionally, I have real concerns about the long-term impacts on the slopification of information, not just in the sense that there could be immediate impacts on cognitive function, but in the ‘wow, language as we know it is a pretty recent development in human evolution if you really think about it; wonder if we’re taking it for granted’ kind of way.
However, there are clear uses for AI that benefit specific work contexts. The tool is ultimately limited by the person using it, and undergrads have always found powers to bypass required courses they have no interest in (I know I did!). The toothpaste is out of the tube; generative AI chatbots aren’t going anywhere. Let me share some practices that have helped me to get the most out of them.
Be specific
This gets easier with practice, but it’s important to bound the task that you are trying to do into incremental, easy to process pieces. The reason for this is that the chatbots aren’t as good as they want you to think they are. The more substantial the task, the higher the likelihood that they will introduce new errors.
Ask for conceptual reviews (comments), rather than edits.
While it can be very helpful to ask for a specific syntactical fix, e.g.:
Q: What is the NumPy function that turns an array into zeros like another array?
A: numpy.zeros_like(...)
I’ve also found value in is asking the chatbot to review a function that I’ve written and ask if anything looks out of whack. This can sometimes catch indentation errors or logical errors. It will often make suggestions about what could cause errors in relation to the larger codebase, such as ‘if argument x is not explicitly provided, then the function will fail downstream.’ This was also helpful for me to validate my understanding of more advanced object-oriented patterns in python.
Don’t copy and paste the outputs
When you do ask for specific edits to your code, it can be tempting to copy the output directly and replace your existing code blog. More often than not, you are not providing the AI with the full contexts of changes you have made since previously sent messages, and it will produce stale versions of your code, or code that has additional bits based on assumptions the AI made about your purpose for writing the code.
A better practice is to identify key changes made by the AI that will directly address the question posed and write those changes by hand. This helps to solidify any new knowledge you could obtain from this interaction, and next time you might not need to ask chat, you can simply identify the source of the problem, ultimately making you a better coder.
When I have copied and pasted, I find myself in a loop where the code provided doesn’t work, so I repose the question, but because I don’t know why the code doesn’t work, the returned answer is not very productive. Back and forth we go, me asking chat to please rewrite its answer, based on the aggregated data that it initially spat back at me, and chat returning an answer that is about 60% correct and only half as useful. Slop in, slop out.
GSOC Summary
Okay, i’ll get off my soapbox now, here’s a summary of the steps I took to complete the project.
Fork the pysal/spopt repository
I had actually forked this repository a few years ago when I was learning how to use github with Serge Rey at U.C. Riverside, so I just had to update my (very) stale fork.
Checkout into a new feature branch
I generated a new branch based on the initial PR from ljwolf’s main branch.
Spin up the Open Source Routing Machine (OSRM)
This is probably the most arcane element in the new module, though my hope is that the new route.ipynb notebook in the spopt repository demystifies it a bit. Basically, the OSRM backend is a docker image that takes up a port on your computer and listens for requests. These requests are sent into OSRM servers and the data are returned. However, in order for this to work, you need to tell OSRM what general area you are operating in and do some pre-processing of the data files.
Reproduce Guinness example
ljwolf’s initial PR contained an example VRP where Guinness needs to be delivered from a single central depot to all of the pubs in Dublin, Ireland. The first major hurdle for this project was to simply reproduce this problem and solution. It was initially developed for use with homespun tools, my task was to play with it and make changes to the code until I stopped seeing tracebacks and started seeing the solver outputs.
Decide how to ‘generalize’ the routing engine
We settled on wrapping the spopt.route module around routingpy because it handles requests through many different routers but returns queries in a uniform way. Each routing service have methods called matrix and directions. matrix returns distance and/or duration matrices N * N, where N is the number of clients and the number of depots, while directions is used to obtain the ‘way’, or the geometry linestring object between two points on the map. So the matrices are first obtained and used to solve the optimization problem, and then the solution points and routes are passed to directions, which helps build the visualization of the solution. All of this core functionality can be found in spopt/route/engine.py, in two functions: build_specific_route and build_route_table.
This architecture means that while we decided on OSRM as the default routing service, a user could theoretically pass one of the different API-key enabled routers to spopt.route and the results would be usable throughout the module. In truth, the two functions in engine.py might require some changes to enable this, but these would be minor now that the core infrastructure is in place.
Implement the haversine fallback
Once the routing engine was in place, it was important to implement a fallback in the event that no routing engine is passed by the user, in which case haversine (straight line) distances are used in place of the road network. While this is inherently less useful as a vehicle routing solution, it allows the module to be usable if the user cannot start the OSRM backend. Additionally, it’s not difficult to imagine an application where goods are carried on foot, and haversine distances might be more appropriate than a vehicular route.
Produce instructional/demo notebook
The last step, then, was to produce a jupyter notebook communicating how to set up and use this new module. I have been working on a notebook using my notes about the project throughout the summer, so it was pretty straightforward to adapt these notes into a narrative structure, but a writer never knows how useful their prose is until they get feedback from readers.
Pull request (final submission for GSOC 2025)
Find the PR here.
Testing and post-GSOC 2025 development
One final thing to implement is a test file for the new module’s functionality using pytest. After this, the GSOC project is officially wrapped. I’m grateful for this opportunity to learn so much in such a short time, and honored to be able to contribute to the PySAL library!
I’ve suggested above that there is still more work that might need to be done to allow compatibility with other routing engines that are supported by routingpy, and I’ll still be around to provide feedback for any future PR or issue tickets that arise in service of this. As well, I’m likely to adapt code developed this summer to help me work through my dissertation, and I’d be happy to lend some of this to the library if appropriate.
Finally, let me extend a heartfelt thanks to the mentors that guided me through this project, Levi John Wolf and Germano Barcelos dos Santos, as well as the larger PySAL development team who were instrumental in the production of this module and in my learning over this summer.
Thanks for reading!