Dimensionality reduction from n >> 2 dimensions to 2 dimensions can be very fickle, so the hyperparameters matter. Your visualization can change significantly significantly depending on choice of metric.
You may want to consider projecting to more than 2 dimensions too. You may ask, how does one visualize more than two dimensions? Through a scatterplot matrix of 2 axes at a time.
These are used for PCA-type multivariate analyses to visualize latent variables in higher dimensions than 2, but 2 dimensions at a time. Some clustering behavior that cannot be seen in 2 axes might be seen in higher dimensions. We used to do this our lab to find anomalies in high dimensions.
About fickleness... indeed i've found this a kinda problematic thing when running large-d text embeddings through umap -- it always comes out spherical, blob-shaped, without any obvious segregation in the low-d projected space.
IMO it's very difficult to make a "fire and forget" embedding interpreter. Maybe I never found the right parameters to umap but the results of running it (or any dimension reduction algo) always left me a bit underwhelmed.
lmk if anyone has any thoughts...if I could go back I may have not gone through Electron
Doing dimensionality reduction locally posed a few challenges in terms of application size--the idea was that by analyzing just a few thousand randomly sampled points you can get an idea of your data through a local GUI where you interact with your data and see some correlated metadata.
Not sure if there's too much need for an individual GUI to go along with Postgres as a VectorDB, maybe people just do analysis separate from a normal "GUI"? But maybe not.
If I had a nickel for GUI/viz tools that bury the image/video or straight up don't have it in the readme.. lends credence to the popular opinion that engineers don't know how to communicate
As a non-native English speaker and not very familiar with vector database, the title seems very ambiguous to me. I understand it as Postgres as a GUI for some VectorDB. Upon closer inspection, I realized that "Postgres as a VectorDB" is a full name. Maybe shorten that thing to something else. Just my 2 cents.
This is good, but could also be good to mention that you're using umap for dimensionality reduction with cosine metric.
https://github.com/Z-Gort/Reservoirs-Lab/blob/main/src/elect...
Dimensionality reduction from n >> 2 dimensions to 2 dimensions can be very fickle, so the hyperparameters matter. Your visualization can change significantly significantly depending on choice of metric.
https://umap-learn.readthedocs.io/en/latest/parameters.html
You may want to consider projecting to more than 2 dimensions too. You may ask, how does one visualize more than two dimensions? Through a scatterplot matrix of 2 axes at a time.
https://seaborn.pydata.org/examples/scatterplot_matrix.html
These are used for PCA-type multivariate analyses to visualize latent variables in higher dimensions than 2, but 2 dimensions at a time. Some clustering behavior that cannot be seen in 2 axes might be seen in higher dimensions. We used to do this our lab to find anomalies in high dimensions.
About fickleness... indeed i've found this a kinda problematic thing when running large-d text embeddings through umap -- it always comes out spherical, blob-shaped, without any obvious segregation in the low-d projected space.
IMO it's very difficult to make a "fire and forget" embedding interpreter. Maybe I never found the right parameters to umap but the results of running it (or any dimension reduction algo) always left me a bit underwhelmed.
Have you tried PaCMAP? It should be better and faster
Thanks for the pointer to PacMap.
I just tried it. My verdict?
PacMap >= UMAP >> t-SNE.
UMAP captures the basic pattern but PacMap makes it crisper.
Wow, thanks for that!
I have yet to find a better tool than the old Tensorflow projector: https://projector.tensorflow.org/
Granted, it requires to prepare your data into TSV files first.
That is indeed an excellent tool. Allows one to dynamically adjust and recompute umap and t-sne.
lmk if anyone has any thoughts...if I could go back I may have not gone through Electron
Doing dimensionality reduction locally posed a few challenges in terms of application size--the idea was that by analyzing just a few thousand randomly sampled points you can get an idea of your data through a local GUI where you interact with your data and see some correlated metadata.
Not sure if there's too much need for an individual GUI to go along with Postgres as a VectorDB, maybe people just do analysis separate from a normal "GUI"? But maybe not.
What you think?
Just some fast feedback, I can't copy & paste in the connection url input form. On a mac.
Once loaded, I get the error "Table must contain a UUID column for vector visualization."
I'm assuming it's trying to find an ID column for grouping? Can we manually specify this? My ID columns are varchars.
Same here. I'm using langchain which creates a varchar id column. It also has different collections on the same table.
Have folks seen https://atlas.nomic.ai/ <-- absolutely beautiful vector visualization
Proprietary hosted solution to gain as I uncover insights in my data? Hard pass
Seem to require sign ups just to view it.
Why use PostgreSQL instead of columnar databases that are likely to perform way better for these types of analytical workloads?
README suggestions:
Put the animated gif at the top
Add subtitles to the gif explaining what you're doing.
If I had a nickel for GUI/viz tools that bury the image/video or straight up don't have it in the readme.. lends credence to the popular opinion that engineers don't know how to communicate
Does this use pgVector?
It lets you visualize any column with type "EMBEDDING", and I think the only way to get that is through pgvector/pgvectorscale.
That is excellent visualization!
Very interesting, thanks for sharing!
As a non-native English speaker and not very familiar with vector database, the title seems very ambiguous to me. I understand it as Postgres as a GUI for some VectorDB. Upon closer inspection, I realized that "Postgres as a VectorDB" is a full name. Maybe shorten that thing to something else. Just my 2 cents.
It’s just plain bad grammar, the title should be
“Show HN: Reservoirs Lab, a Postgres VectorDB GUI”
I think the confusing term is "VectorDB" which sounds like a name of an existing product. "A vector db GUI powered by Postgres"?