• 0 Posts
  • 51 Comments
Joined 1Y ago
cake
Cake day: Jun 09, 2023

help-circle
rss

The data are stored, so it’s not a live-feed problem. It is an inordinate amount of data that’s stored though. I don’t actually understand this well enough to explain it well, so I’m going to quote from a book [1]. Apologies for wall of text.

“Serial femtosecond crystallography [(SFX)] experiments produce mountains of data that require [Free Electron Laser (FEL)] facilities to provide many petabytes of storage space and large compute clusters for timely processing of user data. The route to reach the summit of the data mountain requires peak finding, indexing, integration, refinement, and phasing.” […]

"The main reason for [steep increase in data volumes] is simple statistics. Systematic rotation of a single crystal allows all the Bragg peaks, required for structure determination, to be swept through and recorded. Serial collection is a rather inefficient way of measuring all these Bragg peak intensities because each snapshot is from a randomly oriented crystal, and there are no systematic relationships between successive crystal orientations. […]

Consider a game of picking a card from a deck of all 52 cards until all the cards in the deck have been seen. The rotation method could be considered as analogous to picking a card from the top of the deck, looking at it and then throwing it away before picking the next, i.e., sampling without replacement. In this analogy, the faces of the cards represent crystal orientations or Bragg reflections. Only 52 turns are required to see all the cards in this case. Serial collection is akin to randomly picking a card and then putting the card back in the deck before choosing the next card, i.e., sampling with replacement (Fig. 7.1 bottom). How many cards are needed to be drawn before all 52 have been seen? Intuitively, we can see that there is no guarantee that all cards will ever be observed. However, statistically speaking, the expected number of turns to complete the task, c, is given by: where n is the total number of cards. For large n, c converges to n*log(n). That is, for n = 52, it can reasonably be expected that all 52 cards will be observed only after about 236 turns! The problem is further exacerbated because a fraction of the images obtained in an SFX experiment will be blank because the X-ray pulse did not hit a crystal. This fraction varies depending on the sample preparation and delivery methods (see Chaps. 3–5), but is often higher than 60%. The random orientation of crystals and the random picking of this orientation on every measurement represent the primary reasons why SFX data volumes are inherently larger than rotation series data.

The second reason why SFX data volumes are so high is the high variability of many experimental parameters. [There is some randomness in the X-ray pulses themselves]. There may also be a wide variability in the crystals: their size, shape, crystalline order, and even their crystal structure. In effect, each frame in an SFX experiment is from a completely separate experiment to the others."

The Realities of Experimental Data” "The aim of hit finding in SFX is to determine whether the snapshot contains Bragg spots or not. All the later processing stages are based on Bragg spots, and so frames which do not contain any of them are useless, at least as far as crystallographic data processing is concerned. Conceptually, hit finding seems trivial. However, in practice it can be challenging.

“In an ideal case shown in Fig. 7.5a, the peaks are intense and there is no background noise. In this case, even a simple thresholding algorithm can locate the peaks. Unfortunately, real life is not so simple”

It’s very cool, I wish I knew more about this. A figure I found for approximate data rate is 5GB/s per instrument. I think that’s for the European XFELS.

Citation: [1]: Yoon, C.H., White, T.A. (2018). Climbing the Data Mountain: Processing of SFX Data. In: Boutet, S., Fromme, P., Hunter, M. (eds) X-ray Free Electron Lasers. Springer, Cham. https://doi.org/10.1007/978-3-030-00551-1_7


Unfortunately no. I don’t know any research scientists who even make 6 figures. You’re lucky to break even 50k if you’re in academia. Working in industry gets you better pay, but not by too much. This is true even in big pharma, at least on the biochemical/biomedical research front. Perhaps non-research roles are where the big bucks are.


He doesn’t directly control anything with C++ — it’s just the data processing. The gist of X-ray Crystallography is that we can shoot some X-rays at a crystallised protein, that will scatter the X-rays due to diffraction, then we can take the diffraction pattern formed and do some mathemagic to figure out the electron density of the crystallised protein and from there, work out the protein’s structure

C++ helps with the mathemagic part of that, especially because by “high throughput”, I mean that the research facility has a particle accelerator that’s over 1km long, which cost multiple billions because it can shoot super bright X-rays at a rate of up to 27,000 per second. It’s the kind of place that’s used by many research groups, and you have to apply for “beam time”. The sample is piped in front of the beam and the result is thousands of diffraction patterns that need to be matched to particular crystals. That’s where the challenge comes in.

I am probably explaining this badly because it’s pretty cutting edge stuff that’s adjacent to what I know, but I know some of the software used is called CrystFEL. My understanding is that learning C++ was necessary for extending or modifying existing software tools, and for troubleshooting anomalous results.


A friend of mine whose research group works on high throughout X-ray Crystallography had to learn C++ for his work, and he says that it was like “wrangling an unhappy horse”.


Sorry to reply to this so late, I procrastinated because unfortunately my answer is that I don’t know of any communities, perhaps because I’m a scientist who loves maths rather than a mathematician.

However, I will use this opportunity to share some fun stuff from people I like.

https://youtu.be/H0Ek86IH-3Y by Oliver Lugg on Youtube is great. His channel is very eclectic though, and there isn’t much pure maths. I love his shitposting tone though, and he has a discord community that were pretty mathsy when I was in it.

A blog-type site that I enjoy is Tai-Danae Bradley’s https://www.math3ma.com/about, largely because I’ve discovered many other cool researchers through her site.

I also really enjoy Eugenia Cheng’s books, especially as someone who is interested in understanding how to write good scientific communication that is accessible without “dumbing things down”. I recently finished “The Joy of Abstraction”.

Apologies that this isn’t what you actually were looking for. I share your distaste at Reddit: I have used Reddit occasionally for those niche communities that aren’t available elsewhere (yet!), but the atmosphere is increasingly toxic. I fear that smaller communities that flee are congealing in harder to discover places, like Discord.


I had to do it for the first time last year and I was slightly giddy from the novelty of it.


Once upon a time, a thing happened. And then there was a facsimile of narrative conflict, but everything worked out in the end, because that’s how all the short stories by LLMs seem to work.



I have a lot of ebooks that I download for university research, hobby learning and friends who ask for help sourcing books. I put everything in my calibre library, which is great for metadata management (tip: I have it set so new books that I’ve just imported get a tag of “new”, which I remove when I have processed their metadata. This allows me to chip away at ensuring the metadata is correct and good, even if I don’t do it at time of import).

Anyway, at one point I found myself at risk of becoming overwhelmed by books, because if I’m wanting to learn some category theory, for example, I’d have multiple books that seem to be relevant. Some of them were recommended by programmers, some of them assume a higher level of maths background knowledge, some of them are more fun to read — once upon a time I might’ve known which was which, but if there’s a significant gap between me downloading stuff and using it (which is often the case, I’m quite opportunistic with book recommendations), I may forget. Making a note of why I downloaded a particular book is something I’ve been trying to do more, so I can identify the useful things at the right time — the calibre notes field can work for that, but I’m still figuring out how to manage this in a wider sense because I do a lot of reading and it’s easy to forget why I’m reading a particular thing. I think I have a calibre plugin to show which things I’ve read also.

Another related thing is that I will take a cursory look over a book when I download it, and I may delete it and not put it into my calibre library. This feels significant because downloading a book doesn’t make it one of my books, ‘taking it home’ and putting it away on my ‘bookshelf’ makes it mine. In short, I try to be mindful in my curation activities, recognising that doing it in big clumps with my whole collection doesn’t really work and that pruning little and often helps more.


Regex feels distinctly eldritch to me. Like, a lot of computing knowledge feels like magic, but regex feels like the kind of magic you get by consorting with dark forces


Now I’m thinking about an ex-programmer supervillain who does this as her big foray into supervillainy


I wonder what would facilitate people to make their own solutions in this way. Like, I have made a few apps or automation things myself, but if I look at my “normie” friends who don’t have the level of tech familiarity that I do, they struggle with whatever out of the box solutions they can find. Poor IT education is a big part of this, and I’ve been wondering a lot about what would need to change for the average “normie” to be empowered to tinker


When I was at university, the student union had a small fund for creative projects that weren’t related to your degree. Many of the people who applied for cameras also included Adobe licenses on their funding application, because many of them were new to film or photography so they defaulted to what is “industry standard”, because that’s what the majority of online tutorials are available for.


I think the “Moved from Jekyll to Hugo” dot has an implicit catchment area around it, which includes people who don’t technically fit that description, but they’re close. I’ve used neither Jekyll nor Hugo, but the fact I understood that archetype meant I felt pulled in by the gravity of that point.


The thing is that that was how Google became so big in the first place. PageRank was a cool way of trying to filter out the garbage and it worked real well. Even my non techy friends have been getting frustrated with search not working like it used to (even before all this Gemini stuff was added)


I think people like your father make bank because even though new programmers could learn COBOL, that wouldn’t be enough for them to be able to fulfill the same niche your father and other established COBOL programmers occupy; any programming language has a disparity between “the proper way to do things”, and the kind of kludges you see in the field, but few have the kind of baggage that COBOL does, in terms of how long it’s been around and having things built on top of it.


That’s a really cool idea actually. I knew a guy who used to install viruses for fun on a separate machine that wasn’t networked. I bet a more creative person than I could probably figure out a fun learning activity for kids using a “disposable” system


Comment that I’m adding on a couple of friends’. One lives in Norway, one lived in India. They told me that both of these places have an issue with accessing media and other digital goods legitimately, often finding themselves willing but unable to pay for something (I was surprised to hear this about Norway — my friend speculates that Norway is small enough that it might simply be forgotten about when big media companies negotiate rights). They both said that VPNs and piracy are way more normalised in their home countries, because it was either that, or miss out on loads of stuff.

Feel it’s useful and important to highlight that the degree to which piracy is normalised depends on where you are.


Ah, you must have access to the same internet library that my Dad used whenever I’d give him my iPod and a list of music, and he’d return it to me full of music. I don’t remember when I realised that he was pirating stuff, probably about the time that I started pirating stuff.


I’m not sure. I don’t plan on having kids, so this is a purely theoretical question that I won’t have to answer in practice, but I think I probably would, at least to some degree.

I had a pretty iconically millennial childhood when it comes to tech; I remember my mum being on the phone to the internet people and asked “he’s offering me an unlimited packaged for [money] extra. Is that good, do we need that?”, to which my brother and and I vigorously nodded. We were young enough we didn’t know shit, but unlimited sounded good and we weren’t paying the bills. My mum probably realised we didn’t know what unlimited Vs metered internet meant in practice, and opted for unlimited as the safe option, because if she felt the need to ask her children for advice, she wouldn’t be great at managing a metred connection. That’s the context in which I grew up and is why I’m as techy as I am today.

I learned the hard way, and whilst I don’t think that’s necessarily the best way to learn, I don’t know how one might teach people how to recognise which “download” button to press, and when a dodgy looking site is actually dodgy. It’s like internet street smarts, but what that means has changed since I was a kid, and I don’t necessarily know how I’d teach that beyond the basics, like installing adblockers and other common sense things.


That reminds me of a fairly recent article about research around visualisation systems to aid with interpretable or explainable AI systems (XAI). The idea was that if we can make AI systems that explain their reasonings, then they can be a useful tool, especially in the hands of domain experts.

Turns out that actually, the fancy visualisations that made it easier to understand how the model had come to a conclusion actually made subject matter experts less accurate in catching errors. This surprised researchers and when they later tried to make sense of it, they realised that they had inadvertently dialled up people’s likelihood to trust the model because it looked legit.

One of my favourite aphorisms is “all models are wrong, some are useful.” Seems that the tricky part is figuring out how wrong and how useful.


“The fact that Kratos isn’t the same person he was in the old series is basically the entire point.”

I always feel a little bit sorry for rage bigots like this, because of how dull their world and experiences must be. Like if he felt that the new Kratos felt narratively unsatisfying, or that his journey felt unsatisfying, that’d at least be an opinion with the potential to be interesting. But nah, it’s just “things are different”, with embedded implication that different = bad.



This reads like a poem, I unironically love this

I am the Rust programmer,
I will rewrite the world in Rust.
I will rewrite the world in Rust
because the world is unsafe.
As I am the Rust programmer
I will keep writing rust
until the world is safe.
After the world is safe,
I will not rewrite it in Rust.
Because I am the Rust programmer
I will retire from programmer in Rust.

I will come to you when you are sleeping,
and I will unlock your computer
using a memory leak.
If I find javascript on your computer,
I will delete them.
Do not try to stop me,
if you try to stop me
I will do it anyways.
I am the Rust programmer,
if you program in javascript,
you will scream.

You will be sleeping
as I rewrite your computer in Rust.
You will not notice me
as I am the Rust programmer,
I am fast,
but not too fast for your computer.
I know your computer
just as it knows me.
After I rewrite your computer,
you will love your computer.
You will love your computer
because it is written in Rust,
I will do the same to all computers because
I am the Rust programmer.

I will not stop at your computer,
I will rewrite the world
because the world is unsafe.
Your brain is written in C,
your memory is unsafe.
If your brain is written in C,
you will forget what I just said.
I will rewrite your brain in Rust,
you cannot stop me from writing Rust
as I am the Rust programmer.
If you try to stop me,
you will not remember it.
Because I am the Rust programmer I can
manually remove your memory,
you will not remember me.
After I rewrite you in Rust,
you will enjoy the world
with a safe memory,
you will not forget
that I am superior,
I am the Rust programmer.

I will rewrite the world,
I will rewrite quantum mechanics
because it is unsafe.
I will not tell you all my plans
before I rewrite you in Rust,
It is because you are made of bugs
I do not trust you.
I am the Rust programmer,
I will rewrite the world in Rust,
you will not forget me
Because I am the Rust programmer.

(n.b. I’m bad at scansion, forgive any poor line break choices)


Though I wonder if even besides adding an uninterruptible power supply (UPS) (writing acronym out for anyone else who would’ve had to Google it), this might be a useful exercise recovering from outages in general. This is coming from someone who hasn’t actually done any self hosting of my own, but you saying you’re still finding down services reminds me of when I learned the benefit of testing system backups as part of making them.

I was lucky in that I didn’t have any data loss, but restoring from my backup took a lot more manual work than I’d anticipated, and it came at an awkward time. Since then, my restoring from backup process is way more streamlined.


Never used Trakt, but I’m a big fan of ListenBrainz. It’s a large part of why I felt able to cancel my Spotify subscription


Man, “too clever” is a phrase that always throws me for a loop, even though I understand what is meant by it; over the years, as I grow wiser, I learn to be less clever. Still weird to think of it this way though


Will still occasionally throw “I’m sorry, I can’t”s but you just gotta remind it to follow the prompt and stay in character

This sounds like a kink scene with bad consent


Thanks for linking to the tracker site, I’ve been meaning to find more ways to audit the amount of trackers in my apps for a while now.


I had a similar experience, but after I eventually figured it out, I grew to appreciate the insert key. Mostly because there were a few times when someone else was getting frustrated with the same problem and I was able to help them. It made me feel powerful; I had suffered, but I now possessed the knowledge to save others from the same fate.


Though to be fair, even this might be progress, of a sort; years ago, I had a girlfriend who had a bunch of apple products, partly because she worked in sound design. At the time, I had never used Linux and I found using her Mac distinctly unfamiliar. When I eventually tried Linux, some years later, I remember a few instances of going “oh, it’s like on a Mac”.

Those similarities made the whole thing feel a tad less intimidating and probably contributed to (or at least accelerated) me becoming the tech nerd I am today.


I’m glad that you asked this question, because I also was like “wow, seems a bit extreme” before I saw people replying to you that that’s the studio name


This is a great explanation, I wish I could’ve got you to explain a bunch of other network stuff to me back when I was learning


“I refuse to get hyped…”

Ugh, same. I really really really want this to be good. My late best friend introduced me to the first VtM Bloodlines game, as well as VtM more generally. It’d be cool if it did end up being decent, but I don’t think it will



From one odd girl to another, I think both can be true


Do you know if there’s a place where image transcribers on Lemmy are congregating, or are you just doing this independently?


I was using vim for the first time the other day and I was running through the built in vimtutor. I got a call from a friend and they asked what I was up to, and I said I was doing a tutorial for a text editor. At that moment, I felt simultaneously very silly and very smart.


I’m bi, but my appearance is pretty queer coded such that cis-het people tend to read me as “unclear gay or just tech-nerd punk”. I’ve found that when I use the word partner, it can throw people off because they’re clearly fishing for my partner’s gender in a “I can’t tell whether this person is straight or gay” way. Most of the people I’ve dated have been men, but I do like the chaos energy of the confusion